From eb8a1fb35a94689dd4391ed1411d0a1e15fd7f7c Mon Sep 17 00:00:00 2001 From: s1734289 <s1734289@sms.ed.ac.uk> Date: Sun, 31 Jul 2022 18:42:13 +0100 Subject: [PATCH] Add document describing running process of CNV-calling pipeline --- docs/running_cnv_pipeline.md | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) create mode 100644 docs/running_cnv_pipeline.md diff --git a/docs/running_cnv_pipeline.md b/docs/running_cnv_pipeline.md new file mode 100644 index 0000000..af46144 --- /dev/null +++ b/docs/running_cnv_pipeline.md @@ -0,0 +1,33 @@ +This document describes how to run the cnv-calling workflow of the nextflow variant pipeline + +Currently, the pipeline is set up to expect a folder 'ExomeDepth_assets' in the 'trio-whole-exome/pipeline', which contains the Rscript to run ExomeDepth, a script to build a reference ExomeDepth object, and paths to sex specific references A future development will be to move the R scripts into the bin folder, and the reference objects to the assets folder + +The run time of the existing pipeline is currently roughly 10 minutes per family + +required input files + +ped file +sample sheet - with an extra column containing the path to the sample bam file +reference fasta - HG38 is the assembly that the pipeline has been designed for +reference bedfile - exome refseq file +output files: + +outputs are stored in a folder with the family ID + +exome_calls_[bam file name].csv - a csv file produced by ExomeDepth +[individual id]cnv_calls_all_chr.bed - a sorted bedfile with the chromosome and location of variants and variant type. If no variants are present for a chromosome, the start and end both have the value 0 +unique_proband/[individual id]intersects.txt - details the intersects between the variants from the proband and the parents +unique_proband/[individual id]proband_only.bed - this bedfile contains locations of CNVs that only occur in the proband, and neither parent, with the variant type +unique_proband/[individual id]VEP_output.vcf - a vcf containing the output from the VEP command +future output - graph visualising the location of the variants +Assuming the reference fasta and bedfiles are already defined in a config file the pipeline can be run using the command + +nextflow run main.nf -c [path to config] \ +--workflow cnv-calling \ +--ped_file [path to ped file] \ +--sample_sheet [path to sample sheet] + +Likely probelems moving from eddie to ULTRA + +pathing errors where absolute paths are used the format is usually /exports/igmm/eddie/IGMM-VariantAnalysis/ while on Ultra the format will likely be home/u035/u035/shared/- for example in the VEP command pointing to the G2P plug in +modules: The modules loaded by the eddie.config are: 'anaconda/5.3.1', 'singularity', igmm/apps/BEDTools, igmm/apps/samtools/1.6, R/3.5.3, igmm/apps/vep/100 -- GitLab