Documentation for including reads for samples sequenced on previous runs.

7dd81783 · ameyner2 · d17af75f · 7dd81783
Commit 7dd81783 authored 1 year ago by ameyner2
--- a/docs/SOP_alignment_variant_annotation.md
+++ b/docs/SOP_alignment_variant_annotation.md
@@ -156,6 +156,38 @@ ped_file=<input_ped_file>
 cp $ped_file $project_id.ped
 ```

+2. If there are samples from previous sequencing runs to be included in this analysis, the FASTQ for these samples needs to be in the `$READS_DIR/$project_id` folder.
+
+```
+mkdir $READS_DIR/$project_id
+cd $READS_DIR
+```
+
+Check to see if the reads have already been merged and are available in a previous run's folder. For each sample:
+
+```
+find . -name '<sample_id>'
+cp <prev_project_id>/*<sample_id>* $project_id/
+```
+
+If not found, look in the original data folder and copy the original files.
+
+```
+cd $DOWNLOAD_DIR
+find . -name '<sample_id>'
+cp -R path/to/files/*<sample_id>* $READS_DIR/$project_id/
+```
+
+Move to the project reads folder and rename the files to match the `<sample_id>_<R[12]>.gz` pattern if there is only one file per read end. If there are two files per read end, merge the files with the given script.
+
+```
+cd $READS_DIR/$project_id
+python $SCRIPTS/merge_and_rename_NGI_fastq_files.py file1_R1.gz:file2_R1.gz <sample_name> 1 .
+python $SCRIPTS/merge_and_rename_NGI_fastq_files.py file1_R2.gz:file2_R2.gz <sample_name> 2 .
+```
+
+Remove the original files.
+
 3. In the params folder, create the symlinks to the reads and the bcbio configuration files. If specifying a common sample suffix, ensure it includes any joining characters, e.g. “-“ or “_”, so that the family identifier can be cleanly separated from the suffix. Get the number of families from the batch.

 *Edinburgh Genomics data*