Skip to content
Snippets Groups Projects
Commit 61c7c113 authored by mwham's avatar mwham
Browse files

Doc updates, updating format of sample sheets. Adding nextflow.config. Adding...

Doc updates, updating format of sample sheets. Adding nextflow.config. Adding main.nf entrypoint, moving param declarations there. Making most input parameters their own Channels.
parent 95f1c2d0
No related branches found
No related tags found
1 merge request!4NextFlow variant calling
Pipeline #14066 failed
Showing with 288 additions and 78 deletions
# Trio-Whole-Exome pipeline # Trio-Whole-Exome Pipeline
This is an automated version of the scripts currently run manually according to SOP as part of the whole exome trios This is an automated version of the scripts currently run manually according to SOP as part of the whole exome trios
project with David Fitzpatrick's group. This pipeline is controlled by [NextFlow](https://www.nextflow.io/) project with David Fitzpatrick's group. This pipeline is controlled by [NextFlow](https://www.nextflow.io).
## Setup ## Setup
A [Conda](https://docs.conda.io) environment containing NextFlow is available in `environment.yaml`. Once you have Conda This pipeline requires:
installed, you can create an environment by `cd`-ing into this project and running the command:
$ conda env create -n <environment_name> - NextFlow
- An install of BCBio v1.2.8
A [Conda](https://docs.conda.io) environment containing NextFlow is available in `environment.yml`. This can be created
with the command:
## Running $ conda env create -n <environment_name> -f environment.yml
## Running the pipeline
The pipeline requires two main input files: The pipeline requires two main input files:
### Configuration
This pipeline uses a config at trio-whole-exome/nextflow.config, containing profiles for different sizes of process.
NextFlow picks this up automatically.
A second config is necessary for providing executor and param information. This can be supplied via the `-c` argument.
Parameters:
- `bcbio` - path to a BCBio install, containing 'anaconda', 'galaxy', 'genomes', etc
- `bcbio_template` - path to a template config for BCBio variant calling. Should set `upload.dir: ./results` so that
BCBio will output results to the working dir.
- `output_dir` - where the results get written to on the system. The variant calling creates initial results here,
and variant prioritisation adds to them
- `target_bed` - bed file of Twist exome targets
- `reference_genome` - hg38 reference genome in fasta format
- `parse_peddy_output` - path to the parse_peddy_output Perl script. Todo: remove once scripts are in bin/
### Samplesheet ### Samplesheet
This is a tab-separated file mapping fastq pairs to metadata. The columns are individual ID, family ID, fastq sample ID, This is a tab-separated file mapping individuals to fastq pairs. The columns are individual_id, read_1 and read_2. If a
r1 fastq and r2 fastq. If a sample has been sequenced over multiple lanes, then include a line for each fastq pair: sample has been sequenced over multiple lanes, then include a line for each fastq pair:
individual_id family_id sample_id read_1 read_2 individual_id read_1 read_2
000001 000001 12345_000001_000001_WESTwist_IDT-B path/to/lane_1_r1.fastq.gz path/to/lane_1_r2.fastq.gz 000001 path/to/lane_1_r1.fastq.gz path/to/lane_1_r2.fastq.gz
000001 000001 12345_000001_000001_WESTwist_IDT-B path/to/lane_2_r1.fastq.gz path/to/lane_2_r2.fastq.gz 000001 path/to/lane_2_r1.fastq.gz path/to/lane_2_r2.fastq.gz
### Ped file ### Ped file
Tab-separated Ped file mapping individuals to each other and affected status. Per the Tab-separated Ped file mapping individuals to each other family IDs and and affected status. Per the
[specification](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format), the columns are [specification](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format), the columns are
family ID, individual ID, father ID, mother ID, sex (1=male, 2=female, other=unknown), affected status (-9 or 0=missing, family ID, individual ID, father ID, mother ID, sex (1=male, 2=female, other=unknown), affected status (-9 or 0=missing,
1=unaffected, 2=affected): 1=unaffected, 2=affected):
...@@ -36,27 +59,45 @@ family ID, individual ID, father ID, mother ID, sex (1=male, 2=female, other=unk ...@@ -36,27 +59,45 @@ family ID, individual ID, father ID, mother ID, sex (1=male, 2=female, other=unk
000001 000002 0 0 1 1 000001 000002 0 0 1 1
000001 000003 0 0 2 1 000001 000003 0 0 2 1
The pipeline can now be run. First, check for errors: The pipeline does support non-trios, e.g. singletons, duos, quads.
$ nextflow run pipeline/validation.nf --ped_file path/to/batch.ped --sample_sheet path/to/batch_samplesheet.tsv ### Usage
The pipeline can now be run. First, run the initial variant calling:
$ nextflow path/to/trio-whole-exome/main.nf \
-c path/to/nextflow.config
--workflow 'variant-calling' \
--pipeline_project_id projname --pipeline_project_version v1 \
--ped_file path/to/batch.ped \
--sample_sheet path/to/samplesheet.tsv
Todo: variant prioritisation workflow
Todo: run the main processing
## Tests ## Tests
This pipeline has automated tests contained in the folder `tests/`. To the run the tests locally, `cd` to this folder This pipeline has automated tests contained in the folder `tests/`. To the run the tests locally, `cd` to this folder
with your Conda environment active and run `./run_tests.sh`. with your Conda environment active and run the test scripts:
- run_tests.sh
- run_giab_tests.sh
These tests use the environment variable `NEXTFLOW_CONFIG`, pointing to a platform-specific config file.
## Terminology FAQ
## Terminology - 'Batch'
- Slightly ambiguous term - can be a pipeline batch, a sequencing batch or a BCBio batch. To this end, a single
run of this pipeline is known as a project.
- 'Pipeline project'
- A single run of this pipeline, potentially mixing samples and families from multiple sequencing batches. There's
always one Ped file and sample sheet per pipeline project.
- 'Sequencing batch'
- A group of samples that were prepared and sequenced together.
- 'BCBio batch'
- Used internally by BCBio to identify a family.
- 'Sample ID'
- Specific to a sequencing batch, family ID, individual ID and extraction kit type
- Batch: slightly ambiguous term - could be a pipeline batch, a sequencing batch or a BCBio batch
- Pipeline batch: a single run of this pipeline, potentially mixing samples and families from multiple
sequencing batches
- Sequencing batch: a group of samples that were prepared and sequenced together
- BCBio batch: used internally by BCBio to identify a family
- Sample ID: specific to a sequencing batch, family ID, individual ID and extraction kit type
- file_list.tsv: there's one of these files per sequencing batch, summarising all fastqs in the batch. A
pipeline batch may need to refer to multiple different individuals across different file lists.
- Ped file: defines family relationships between individuals. There's always one Ped file per pipeline batch.
- Sample sheet: links the Ped file and file list(s) by defining what raw fastqs belong to each individual.
main.nf 0 → 100644
nextflow.enable.dsl = 2
include {var_calling} from './pipeline/var_calling.nf'
// which part of the pipeline to run - either 'variant-calling' or 'variant-prioritisation'
params.workflow = null
// path to a bcbio install, containing 'anaconda', 'galaxy', 'genomes', etc
params.bcbio = null
// path to a template config for bcbio variant calling
params.bcbio_template = null
// where the results get written to on the system. The variant calling creates initial
// results here, and variant prioritisation adds to them
params.output_dir = null
// name of the pipeline batch, e.g. '21900', '20220427'
params.pipeline_project_id = null
// version of the pipeline batch, e.g. 'v1'
params.pipeline_project_version = null
// bed file of Twist exome targets
params.target_bed = null
// hg38 reference genome in fasta format
params.reference_genome = null
// path to the parse_peddy_output Perl script. Todo: remove once scripts are in bin/
params.parse_peddy_output = null
// path to a Ped file describing all the families in the pipeline batch
params.ped_file = null
// path to a samplesheet mapping individual IDs to fastq pairs
params.sample_sheet = null
workflow {
if (params.workflow == 'variant-calling') {
var_calling()
} else if (params.workflow == 'variant-prioritisation') {
println "Variant prioritisation coming soon"
} else {
exit 1, 'params.workflow required - variant-calling or variant-prioritisation'
}
}
process {
executor = 'slurm'
cpus = 4
memory = 8.GB
time = '6h'
withLabel: small {
executor = 'local'
cpus = 2
memory = 2.GB
}
withLabel: medium {
cpus = 4
memory = 8.GB
}
withLabel: large {
cpus = 16
memory = 32.GB
}
}
profiles {
debug {
process.echo = true
}
}
nextflow.enable.dsl = 2 nextflow.enable.dsl = 2
params.ped_file = null
params.sample_sheet = null
workflow read_inputs { workflow read_inputs {
/* /*
...@@ -24,9 +22,8 @@ workflow read_inputs { ...@@ -24,9 +22,8 @@ workflow read_inputs {
... ...
] ]
*/ */
ped_file = file(params.ped_file, checkIfExists: true) ch_ped_file = Channel.fromPath(params.ped_file, checkIfExists: true)
ch_ped_file_info = Channel.fromPath(ped_file) ch_ped_file_info = ch_ped_file.splitCsv(sep: '\t')
.splitCsv(sep: '\t')
.map( .map(
{ line -> { line ->
[ [
...@@ -55,9 +52,8 @@ workflow read_inputs { ...@@ -55,9 +52,8 @@ workflow read_inputs {
] ]
] ]
*/ */
samplesheet = file(params.sample_sheet, checkIfExists: true) ch_samplesheet = Channel.fromPath(params.sample_sheet, checkIfExists: true)
ch_samplesheet_info = Channel.fromPath(samplesheet) ch_samplesheet_info = ch_samplesheet.splitCsv(sep:'\t', header: true)
.splitCsv(sep:'\t', header: true)
.map( .map(
{ line -> [line.individual_id, file(line.read_1), file(line.read_2)] } { line -> [line.individual_id, file(line.read_1), file(line.read_2)] }
) )
...@@ -103,6 +99,10 @@ workflow read_inputs { ...@@ -103,6 +99,10 @@ workflow read_inputs {
ch_individuals_by_family = ch_individuals.map({[it[1], it]}) ch_individuals_by_family = ch_individuals.map({[it[1], it]})
emit: emit:
ch_individuals ch_ped_file = ch_ped_file
ch_individuals_by_family ch_ped_file_info = ch_ped_file_info
ch_samplesheet = ch_samplesheet
ch_samplesheet_info = ch_samplesheet_info
ch_individuals = ch_individuals
ch_individuals_by_family = ch_individuals_by_family
} }
...@@ -3,14 +3,6 @@ nextflow.enable.dsl = 2 ...@@ -3,14 +3,6 @@ nextflow.enable.dsl = 2
include {read_inputs} from './inputs.nf' include {read_inputs} from './inputs.nf'
include {validation} from './validation.nf' include {validation} from './validation.nf'
params.bcbio = null
params.bcbio_template = null
params.output_dir = null
params.pipeline_project_id = null
params.pipeline_project_version = null
params.target_bed = null
params.parse_peddy_output = null
process merge_fastqs { process merge_fastqs {
label 'medium' label 'medium'
...@@ -38,6 +30,7 @@ process write_bcbio_csv { ...@@ -38,6 +30,7 @@ process write_bcbio_csv {
input: input:
tuple(val(family_id), val(individual_info)) tuple(val(family_id), val(individual_info))
path(target_bed)
output: output:
tuple(val(family_id), path("${family_id}.csv")) tuple(val(family_id), path("${family_id}.csv"))
...@@ -45,14 +38,16 @@ process write_bcbio_csv { ...@@ -45,14 +38,16 @@ process write_bcbio_csv {
script: script:
""" """
#!/usr/bin/env python #!/usr/bin/env python
import os
target_bed = os.path.realpath('${target_bed}')
individual_info = '$individual_info' individual_info = '$individual_info'
lines = individual_info.lstrip('[').rstrip(']').split('], [') lines = individual_info.lstrip('[').rstrip(']').split('], [')
with open('${family_id}.csv', 'w') as f: with open('${family_id}.csv', 'w') as f:
f.write('samplename,description,batch,sex,phenotype,variant_regions\\n') f.write('samplename,description,batch,sex,phenotype,variant_regions\\n')
for l in lines: for l in lines:
f.write(l.replace(', ', ',') + '\\n') f.write(l.replace(', ', ',') + ',' + target_bed + '\\n')
""" """
} }
...@@ -62,17 +57,19 @@ process bcbio_family_processing { ...@@ -62,17 +57,19 @@ process bcbio_family_processing {
input: input:
tuple(val(family_id), val(individuals), path(family_csv)) tuple(val(family_id), val(individuals), path(family_csv))
path(bcbio)
path(bcbio_template)
output: output:
tuple(val(family_id), val(individuals), path("${family_id}-merged")) tuple(val(family_id), val(individuals), path("${family_id}-merged"))
script: script:
""" """
${params.bcbio}/anaconda/bin/bcbio_prepare_samples.py --out . --csv $family_csv && ${bcbio}/anaconda/bin/bcbio_prepare_samples.py --out . --csv $family_csv &&
${params.bcbio}/anaconda/bin/bcbio_nextgen.py -w template ${params.bcbio_template} ${family_csv.getBaseName()}-merged.csv ${individuals.collect({"${it}.fastq.gz"}).join(' ')} && ${bcbio}/anaconda/bin/bcbio_nextgen.py -w template ${bcbio_template} ${family_csv.getBaseName()}-merged.csv ${individuals.collect({"${it}.fastq.gz"}).join(' ')} &&
cd ${family_id}-merged && cd ${family_id}-merged &&
${params.bcbio}/anaconda/bin/bcbio_nextgen.py config/${family_id}-merged.yaml -n 16 -t local ../${bcbio}/anaconda/bin/bcbio_nextgen.py config/${family_id}-merged.yaml -n 16 -t local
""" """
} }
...@@ -80,13 +77,15 @@ process bcbio_family_processing { ...@@ -80,13 +77,15 @@ process bcbio_family_processing {
process format_bcbio_individual_outputs { process format_bcbio_individual_outputs {
input: input:
tuple(val(family_id), val(individuals), path(bcbio_output_dir)) tuple(val(family_id), val(individuals), path(bcbio_output_dir))
path(bcbio)
path(reference_genome)
output: output:
tuple(val(family_id), path('individual_outputs')) tuple(val(family_id), path('individual_outputs'))
script: script:
""" """
samtools=${params.bcbio}/anaconda/bin/samtools && samtools=${bcbio}/anaconda/bin/samtools &&
mkdir individual_outputs mkdir individual_outputs
for i in ${individuals.join(' ')} for i in ${individuals.join(' ')}
do do
...@@ -99,7 +98,7 @@ process format_bcbio_individual_outputs { ...@@ -99,7 +98,7 @@ process format_bcbio_individual_outputs {
bam=\$indv_input/\$i-ready.bam bam=\$indv_input/\$i-ready.bam
cram="\$indv_output/\$i-ready.cram" && cram="\$indv_output/\$i-ready.cram" &&
\$samtools view -@ ${task.cpus} -T ${params.reference_genome} -C -o \$cram \$bam && \$samtools view -@ ${task.cpus} -T ${reference_genome} -C -o \$cram \$bam &&
\$samtools index \$cram && \$samtools index \$cram &&
bam_flagstat=./\$i-ready.bam.flagstat.txt && bam_flagstat=./\$i-ready.bam.flagstat.txt &&
cram_flagstat=\$cram.flagstat.txt && cram_flagstat=\$cram.flagstat.txt &&
...@@ -165,15 +164,19 @@ process format_bcbio_family_outputs { ...@@ -165,15 +164,19 @@ process format_bcbio_family_outputs {
} }
process collate_pipeline_outputs { process collate_pipeline_outputs {
label 'small' label 'small'
publishDir "${params.output_dir}", mode: 'move', pattern: "${params.pipeline_project_id}_${params.pipeline_project_version}" publishDir "${params.output_dir}", mode: 'move', pattern: "${params.pipeline_project_id}_${params.pipeline_project_version}"
input: input:
val(family_ids)
val(bcbio_family_output_dirs) val(bcbio_family_output_dirs)
val(raw_bcbio_output_dirs) val(raw_bcbio_output_dirs)
path(ped_file)
path(samplesheet)
path(bcbio)
path(parse_peddy_output)
output: output:
path("${params.pipeline_project_id}_${params.pipeline_project_version}") path("${params.pipeline_project_id}_${params.pipeline_project_version}")
...@@ -189,20 +192,25 @@ process collate_pipeline_outputs { ...@@ -189,20 +192,25 @@ process collate_pipeline_outputs {
cp -rL \$d \$outputs/families/\$(basename \$d) cp -rL \$d \$outputs/families/\$(basename \$d)
done && done &&
for f in ${family_ids.join(' ')}
do
grep \$f ${ped_file} > \$outputs/params/\$f.ped
done &&
cd \$outputs/families && cd \$outputs/families &&
${params.bcbio}/anaconda/bin/multiqc \ ../../${bcbio}/anaconda/bin/multiqc \
--title "Trio whole exome QC report: ${params.pipeline_project_id}_${params.pipeline_project_version}" \ --title "Trio whole exome QC report: ${params.pipeline_project_id}_${params.pipeline_project_version}" \
--outdir ../qc \ --outdir ../qc \
--filename ${params.pipeline_project_id}_${params.pipeline_project_version}_qc_report.html \ --filename ${params.pipeline_project_id}_${params.pipeline_project_version}_qc_report.html \
. && . &&
peddy_output=../qc/${params.pipeline_project_id}_${params.pipeline_project_version}.ped_check.txt && peddy_output=../qc/${params.pipeline_project_id}_${params.pipeline_project_version}.ped_check.txt &&
perl ${params.parse_peddy_output} \ perl ../../${parse_peddy_output} \
--output \$peddy_output \ --output \$peddy_output \
--project ${params.pipeline_project_id} \ --project ${params.pipeline_project_id} \
--batch ${bcbio_family_output_dirs[0].getName().split('_')[1]} \ --batch ${bcbio_family_output_dirs[0].getName().split('_')[1]} \
--version ${params.pipeline_project_version} \ --version ${params.pipeline_project_version} \
--ped ${params.ped_file} \ --ped ../../${ped_file} \
--families . && --families . &&
# no && here - exit status checked below # no && here - exit status checked below
...@@ -222,8 +230,8 @@ process collate_pipeline_outputs { ...@@ -222,8 +230,8 @@ process collate_pipeline_outputs {
dest_basename=${params.pipeline_project_id}_${params.pipeline_project_version}_\$family_id && dest_basename=${params.pipeline_project_id}_${params.pipeline_project_version}_\$family_id &&
cp -L \$d/config/\$family_id_merged.csv \$outputs/params/\$dest_basename.csv && cp -L \$d/config/\$family_id_merged.csv \$outputs/params/\$dest_basename.csv &&
cp -L \$d/config/\$family_id_merged.yaml \$outputs/config/\$dest_basename.yaml && cp -L \$d/config/\$family_id_merged.yaml \$outputs/config/\$dest_basename.yaml &&
cp -L ${params.ped_file} \$outputs/params/ && cp -L ${ped_file} \$outputs/params/ &&
cp -L ${params.sample_sheet} \$outputs/params/ cp -L ${samplesheet} \$outputs/params/
done done
""" """
} }
...@@ -242,8 +250,16 @@ workflow process_families { ...@@ -242,8 +250,16 @@ workflow process_families {
take: take:
ch_individuals ch_individuals
ch_ped_file
ch_samplesheet
main: main:
ch_bcbio = file(params.bcbio, checkIfExists: true)
ch_bcbio_template = file(params.bcbio_template, checkIfExists: true)
ch_target_bed = file(params.target_bed, checkIfExists: true)
ch_parse_peddy_output = file(params.parse_peddy_output, checkIfExists: true)
ch_reference_genome = file(params.reference_genome, checkIfExists: true)
ch_merged_fastqs = merge_fastqs( ch_merged_fastqs = merge_fastqs(
ch_individuals.map( ch_individuals.map(
{ indv, family, father, mother, sex, affected, r1, r2 -> { indv, family, father, mother, sex, affected, r1, r2 ->
...@@ -267,9 +283,10 @@ workflow process_families { ...@@ -267,9 +283,10 @@ workflow process_families {
ch_bcbio_csvs = write_bcbio_csv( ch_bcbio_csvs = write_bcbio_csv(
ch_read1_meta.mix(ch_read2_meta).map( ch_read1_meta.mix(ch_read2_meta).map(
{ family_id, sample_id, father, mother, sex, phenotype, merged_fastq -> { family_id, sample_id, father, mother, sex, phenotype, merged_fastq ->
[family_id, [merged_fastq, sample_id, family_id, sex, phenotype, params.target_bed]] [family_id, [merged_fastq, sample_id, family_id, sex, phenotype]]
} }
).groupTuple() ).groupTuple(),
ch_target_bed
) )
ch_bcbio_inputs = ch_joined_indv_info.map( ch_bcbio_inputs = ch_joined_indv_info.map(
...@@ -277,25 +294,42 @@ workflow process_families { ...@@ -277,25 +294,42 @@ workflow process_families {
[family_id, sample_id] [family_id, sample_id]
}).groupTuple().join(ch_bcbio_csvs) }).groupTuple().join(ch_bcbio_csvs)
ch_bcbio_family_outputs = bcbio_family_processing(ch_bcbio_inputs) ch_bcbio_family_outputs = bcbio_family_processing(
ch_individual_folders = format_bcbio_individual_outputs(ch_bcbio_family_outputs) ch_bcbio_inputs,
ch_bcbio,
ch_bcbio_template
)
ch_individual_folders = format_bcbio_individual_outputs(
ch_bcbio_family_outputs,
ch_bcbio,
ch_reference_genome
)
ch_formatted_bcbio_outputs = format_bcbio_family_outputs( ch_formatted_bcbio_outputs = format_bcbio_family_outputs(
ch_bcbio_family_outputs.join(ch_individual_folders) ch_bcbio_family_outputs.join(ch_individual_folders)
) )
collate_pipeline_outputs( collate_pipeline_outputs(
ch_formatted_bcbio_outputs.map({it[0]}).collect(),
ch_formatted_bcbio_outputs.map({it[1]}).collect(), ch_formatted_bcbio_outputs.map({it[1]}).collect(),
ch_formatted_bcbio_outputs.map({it[2]}).collect() ch_formatted_bcbio_outputs.map({it[2]}).collect(),
ch_ped_file,
ch_samplesheet,
ch_bcbio,
ch_parse_peddy_output
) )
} }
workflow { workflow var_calling {
read_inputs() read_inputs()
ch_individuals = read_inputs.out[0]
ch_individuals_by_family = read_inputs.out[1]
validation(ch_individuals) validation(read_inputs.out.ch_individuals)
process_families(ch_individuals) process_families(
read_inputs.out.ch_individuals,
read_inputs.out.ch_ped_file,
read_inputs.out.ch_samplesheet
)
} }
00001_000001 000001_000001 0 000003_000002 1 2
00001_000001 000003_000001 0 0 2 1
00001_000002 000004_000002 0 0 1 2
00001_000003 000005_000003 000007_000003 000008_000003 1 2
00001_000003 000006_000003 000007_000003 000008_000003 2 2
00001_000003 000007_000003 0 0 1 1
00001_000003 000008_000003 0 0 2 1
00001_000001 000001_000001 000002_000002 000003_000002 1 2
00001_000001 000002_000001 0 0 1 1
00001_000001 000003_000001 0 0 2 1
00001_000002 000004_000002 000005_000002 000006_000002 1 2
00001_000002 000005_000002 0 0 1 1
00001_000002 000006_000002 0 0 2 1
individual_id family_id sample_id read_1 read_2 individual_id read_1 read_2
000001 000001 12345_000001_000001_WESTwist_IDT-B assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000001_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_1_00001AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000001_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_1_00001AM0001L01_2.fastq.gz 000001 assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000001_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_1_00001AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000001_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_1_00001AM0001L01_2.fastq.gz
000001 000001 12345_000001_000001_WESTwist_IDT-B assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000001_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_2_00001AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000001_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_2_00001AM0001L01_2.fastq.gz 000001 assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000001_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_2_00001AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000001_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_2_00001AM0001L01_2.fastq.gz
000002 000001 12345_000002_000001_WESTwist_IDT-B assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000002_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_3_00002AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000002_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_3_00002AM0001L01_2.fastq.gz 000002 assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000002_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_3_00002AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000002_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_3_00002AM0001L01_2.fastq.gz
000002 000001 12345_000002_000001_WESTwist_IDT-B assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000002_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_4_00002AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000002_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_4_00002AM0001L01_2.fastq.gz 000002 assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000002_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_4_00002AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000002_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_4_00002AM0001L01_2.fastq.gz
000003 000001 12345_000003_000001_WESTwist_IDT-B assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000003_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_5_00003AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000003_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_5_00003AM0001L01_2.fastq.gz 000003 assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000003_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_5_00003AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000003_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_5_00003AM0001L01_2.fastq.gz
000003 000001 12345_000003_000001_WESTwist_IDT-B assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000003_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_6_00003AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000003_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_6_00003AM0001L01_2.fastq.gz 000003 assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000003_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_6_00003AM0001L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12345_A_Researcher/20210922/12345_000003_000001_WESTwist_IDT-B/200922_A00001_0001_BHNTGMDMXX_6_00003AM0001L01_2.fastq.gz
individual_id family_id sample_id read_1 read_2 individual_id read_1 read_2
000006 000003 12346_000006_000003_WESTwist_IDT-B assets/input_data/edinburgh_genomics/X12346_MD5_Errors/20211005/12346_000006_000003_WESTwist_IDT-B/211005_A00002_0002_AJTHSNRLXX_1_00002AM0002L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12346_MD5_Errors/20211005/12346_000006_000003_WESTwist_IDT-B/211005_A00002_0002_AJTHSNRLXX_1_00002AM0002L01_2.fastq.gz 000006 assets/input_data/edinburgh_genomics/X12346_MD5_Errors/20211005/12346_000006_000003_WESTwist_IDT-B/211005_A00002_0002_AJTHSNRLXX_1_00002AM0002L01_1.fastq.gz assets/input_data/edinburgh_genomics/X12346_MD5_Errors/20211005/12346_000006_000003_WESTwist_IDT-B/211005_A00002_0002_AJTHSNRLXX_1_00002AM0002L01_2.fastq.gz
individual_id read_1 read_2
000001_000001 assets/input_data/giab/AshkenazimTrio/HG002_R1.fastq.gz assets/input_data/giab/AshkenazimTrio/HG002_R2.fastq.gz
000003_000001 assets/input_data/giab/AshkenazimTrio/HG004_R1.fastq.gz assets/input_data/giab/AshkenazimTrio/HG004_R2.fastq.gz
000004_000002 assets/input_data/giab/ChineseTrio/HG005_R1.fastq.gz assets/input_data/giab/ChineseTrio/HG005_R2.fastq.gz
000005_000003 assets/input_data/giab/AshkenazimTrio/HG002_R1.fastq.gz assets/input_data/giab/AshkenazimTrio/HG002_R2.fastq.gz
000006_000003 assets/input_data/giab/ChineseTrio/HG005_R1.fastq.gz assets/input_data/giab/ChineseTrio/HG005_R2.fastq.gz
000007_000003 assets/input_data/giab/AshkenazimTrio/HG003_R1.fastq.gz assets/input_data/giab/AshkenazimTrio/HG003_R2.fastq.gz
000008_000003 assets/input_data/giab/AshkenazimTrio/HG004_R1.fastq.gz assets/input_data/giab/AshkenazimTrio/HG004_R2.fastq.gz
individual_id read_1 read_2
000001_000001 assets/input_data/giab/AshkenazimTrio/HG002_R1.fastq.gz assets/input_data/giab/AshkenazimTrio/HG002_R2.fastq.gz
000002_000001 assets/input_data/giab/AshkenazimTrio/HG003_R1.fastq.gz assets/input_data/giab/AshkenazimTrio/HG003_R2.fastq.gz
000003_000001 assets/input_data/giab/AshkenazimTrio/HG004_R1.fastq.gz assets/input_data/giab/AshkenazimTrio/HG004_R2.fastq.gz
000004_000002 assets/input_data/giab/ChineseTrio/HG005_R1.fastq.gz assets/input_data/giab/ChineseTrio/HG005_R2.fastq.gz
000005_000002 assets/input_data/giab/ChineseTrio/HG006_R1.fastq.gz assets/input_data/giab/ChineseTrio/HG006_R2.fastq.gz
000006_000002 assets/input_data/giab/ChineseTrio/HG007_R1.fastq.gz assets/input_data/giab/ChineseTrio/HG007_R2.fastq.gz
#!/bin/bash
source scripts/nextflow_detached.sh
test_exit_status=0
nextflow -c "$NEXTFLOW_CONFIG" clean -f
echo "Reduced GiaB data - trios"
run_nextflow ../main.nf \
-c "$NEXTFLOW_CONFIG" \
--workflow 'variant-calling' \
--pipeline_project_id giab_test_trios \
--pipeline_project_version v1 \
--ped_file $PWD/assets/input_data/ped_files/giab_test_trios.ped \
--sample_sheet $PWD/assets/input_data/sample_sheets/giab_test_trios.tsv
test_exit_status=$(( $test_exit_status + $? ))
echo "Reduced GiaB data - non-trios"
run_nextflow ../main.nf \
-c "$NEXTFLOW_CONFIG" \
--workflow 'variant-calling' \
--pipeline_project_id giab_test_non_trios \
--pipeline_project_version v1 \
--ped_file $PWD/assets/input_data/ped_files/giab_test_non_trios.ped \
--sample_sheet $PWD/assets/input_data/sample_sheets/giab_test_non_trios.tsv
test_exit_status=$(( $test_exit_status + $? ))
echo "Tests finished with exit status $test_exit_status"
#!/bin/bash #!/bin/bash
source scripts/test_config.sh source scripts/nextflow_detached.sh
bcbio=$PWD/scripts/bcbio_nextgen.py bcbio=$PWD/scripts/bcbio_nextgen.py
bcbio_prepare_samples=$PWD/scripts/bcbio_prepare_samples.py bcbio_prepare_samples=$PWD/scripts/bcbio_prepare_samples.py
...@@ -9,7 +9,6 @@ common_args="--bcbio $bcbio --bcbio_prepare_samples $bcbio_prepare_samples --bcb ...@@ -9,7 +9,6 @@ common_args="--bcbio $bcbio --bcbio_prepare_samples $bcbio_prepare_samples --bcb
test_exit_status=0 test_exit_status=0
nextflow clean -f nextflow clean -f
rm -r ./outputs/* ./work/*
echo "Test case 1: simple trio" echo "Test case 1: simple trio"
run_nextflow ../pipeline/main.nf --ped_file assets/input_data/ped_files/batch_1.ped --sample_sheet assets/input_data/sample_sheets/batch_1.tsv $common_args run_nextflow ../pipeline/main.nf --ped_file assets/input_data/ped_files/batch_1.ped --sample_sheet assets/input_data/sample_sheets/batch_1.tsv $common_args
......
File moved
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment