From eb8a1fb35a94689dd4391ed1411d0a1e15fd7f7c Mon Sep 17 00:00:00 2001
From: s1734289 <s1734289@sms.ed.ac.uk>
Date: Sun, 31 Jul 2022 18:42:13 +0100
Subject: [PATCH] Add document describing running process of CNV-calling
 pipeline

---
 docs/running_cnv_pipeline.md | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 docs/running_cnv_pipeline.md

diff --git a/docs/running_cnv_pipeline.md b/docs/running_cnv_pipeline.md
new file mode 100644
index 0000000..af46144
--- /dev/null
+++ b/docs/running_cnv_pipeline.md
@@ -0,0 +1,33 @@
+This document describes how to run the cnv-calling workflow of the nextflow variant pipeline
+
+Currently, the pipeline is set up to expect a folder 'ExomeDepth_assets' in the 'trio-whole-exome/pipeline', which contains the Rscript to run ExomeDepth, a script to build a reference ExomeDepth object, and paths to sex specific references A future development will be to move the R scripts into the bin folder, and the reference objects to the assets folder
+
+The run time of the existing pipeline is currently roughly 10 minutes per family
+
+required input files
+
+ped file
+sample sheet - with an extra column containing the path to the sample bam file
+reference fasta - HG38 is the assembly that the pipeline has been designed for
+reference bedfile - exome refseq file
+output files:
+
+outputs are stored in a folder with the family ID
+
+exome_calls_[bam file name].csv - a csv file produced by ExomeDepth
+[individual id]cnv_calls_all_chr.bed - a sorted bedfile with the chromosome and location of variants and variant type. If no variants are present for a chromosome, the start and end both have the value 0
+unique_proband/[individual id]intersects.txt - details the intersects between the variants from the proband and the parents
+unique_proband/[individual id]proband_only.bed - this bedfile contains locations of CNVs that only occur in the proband, and neither parent, with the variant type
+unique_proband/[individual id]VEP_output.vcf - a vcf containing the output from the VEP command
+future output - graph visualising the location of the variants
+Assuming the reference fasta and bedfiles are already defined in a config file the pipeline can be run using the command
+
+nextflow run main.nf -c [path to config] \
+--workflow cnv-calling \
+--ped_file [path to ped file] \
+--sample_sheet [path to sample sheet]
+
+Likely probelems moving from eddie to ULTRA
+
+pathing errors where absolute paths are used the format is usually /exports/igmm/eddie/IGMM-VariantAnalysis/ while on Ultra the format will likely be home/u035/u035/shared/- for example in the VEP command pointing to the G2P plug in
+modules: The modules loaded by the eddie.config are: 'anaconda/5.3.1', 'singularity', igmm/apps/BEDTools, igmm/apps/samtools/1.6, R/3.5.3, igmm/apps/vep/100
-- 
GitLab