Skip to content
Snippets Groups Projects
Commit 32d85338 authored by kdonnel1's avatar kdonnel1
Browse files

README formatting change

parent 545c5782
No related branches found
No related tags found
No related merge requests found
Pipeline #48292 failed
##Nextflow vs Bcbio
## Nextflow vs Bcbio
Here, a small number of scripts are provided to systematically test the nextflow (nf) pipeline at several key stages for contrast with bcbio (bb). Many steps produce small output text files which can be compared manually, however, larger files such as alignments and variant calls require a different approach - particularly when we have reason to belive that subtle differences exist. A single individual is used here for illustration, but any samples can be used so long as they are extracted from the correct part of their respective pipelines.
#bwa_stochasticity_check.sh
# bwa_stochasticity_check.sh
Having observed that alignments produced by bb and nf are non-identical, we sought to determine whether bwa-mem has a stochastic component. This script generates three alignments from identical fastq input and demonstrates that each output is indentical.
#bwa_mem_identity_check.sh
# bwa_mem_identity_check.sh
Here, we wish to determine whether the nf bwa_mem (and subsequent sorting and duplicate-marking) will produce the same output as bb when provided with identically sorted input fastqs. To do this, we use the fastqs produced by the bbfastp step as input to the nf alignment step. We then compare the resulting bams, which are identical.
The variability between runs is attributable to the prior 'fastp' step, which outputs fastq with differing read orders on each pass. This, in turn, results in small differences in the aligned output from bwa-mem. However, as read order in fastq is arbitrary, this should in theory not affect the quality of the alignment. If absolute reproducibility is vital, fastp output could be sorted prior to alignment.
#bqsr_identity_check
# bqsr_identity_check
Even when provided with identical alignment input, bqsr recalibration produces non-identical bam files. We wish to determine whether they are in fact meaningfully different, or if the difference is purely technical. To do so, alignments from each pipeline are converted to headerless sam and then sorted via a conventional bash sort. This results in two identical files.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment