From 7dd8178362e1f6e9ccc718932e95f07ff937bc7c Mon Sep 17 00:00:00 2001
From: ameyner2 <alison.meynert@ed.ac.uk>
Date: Tue, 31 Oct 2023 11:15:56 +0000
Subject: [PATCH] Documentation for including reads for samples sequenced on
 previous runs.

---
 docs/SOP_alignment_variant_annotation.md | 32 ++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/docs/SOP_alignment_variant_annotation.md b/docs/SOP_alignment_variant_annotation.md
index 10cabcf..8d0c13d 100644
--- a/docs/SOP_alignment_variant_annotation.md
+++ b/docs/SOP_alignment_variant_annotation.md
@@ -156,6 +156,38 @@ ped_file=<input_ped_file>
 cp $ped_file $project_id.ped
 ```
 
+2. If there are samples from previous sequencing runs to be included in this analysis, the FASTQ for these samples needs to be in the `$READS_DIR/$project_id` folder.
+
+```
+mkdir $READS_DIR/$project_id
+cd $READS_DIR
+```
+
+Check to see if the reads have already been merged and are available in a previous run's folder. For each sample:
+
+```
+find . -name '<sample_id>'
+cp <prev_project_id>/*<sample_id>* $project_id/
+```
+
+If not found, look in the original data folder and copy the original files.
+
+```
+cd $DOWNLOAD_DIR
+find . -name '<sample_id>'
+cp -R path/to/files/*<sample_id>* $READS_DIR/$project_id/
+```
+
+Move to the project reads folder and rename the files to match the `<sample_id>_<R[12]>.gz` pattern if there is only one file per read end. If there are two files per read end, merge the files with the given script.
+
+```
+cd $READS_DIR/$project_id
+python $SCRIPTS/merge_and_rename_NGI_fastq_files.py file1_R1.gz:file2_R1.gz <sample_name> 1 .
+python $SCRIPTS/merge_and_rename_NGI_fastq_files.py file1_R2.gz:file2_R2.gz <sample_name> 2 .
+```
+
+Remove the original files.
+
 3. In the params folder, create the symlinks to the reads and the bcbio configuration files. If specifying a common sample suffix, ensure it includes any joining characters, e.g. “-“ or “_”, so that the family identifier can be cleanly separated from the suffix. Get the number of families from the batch.
 
 *Edinburgh Genomics data*
-- 
GitLab