From 7dd8178362e1f6e9ccc718932e95f07ff937bc7c Mon Sep 17 00:00:00 2001 From: ameyner2 <alison.meynert@ed.ac.uk> Date: Tue, 31 Oct 2023 11:15:56 +0000 Subject: [PATCH] Documentation for including reads for samples sequenced on previous runs. --- docs/SOP_alignment_variant_annotation.md | 32 ++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/docs/SOP_alignment_variant_annotation.md b/docs/SOP_alignment_variant_annotation.md index 10cabcf..8d0c13d 100644 --- a/docs/SOP_alignment_variant_annotation.md +++ b/docs/SOP_alignment_variant_annotation.md @@ -156,6 +156,38 @@ ped_file=<input_ped_file> cp $ped_file $project_id.ped ``` +2. If there are samples from previous sequencing runs to be included in this analysis, the FASTQ for these samples needs to be in the `$READS_DIR/$project_id` folder. + +``` +mkdir $READS_DIR/$project_id +cd $READS_DIR +``` + +Check to see if the reads have already been merged and are available in a previous run's folder. For each sample: + +``` +find . -name '<sample_id>' +cp <prev_project_id>/*<sample_id>* $project_id/ +``` + +If not found, look in the original data folder and copy the original files. + +``` +cd $DOWNLOAD_DIR +find . -name '<sample_id>' +cp -R path/to/files/*<sample_id>* $READS_DIR/$project_id/ +``` + +Move to the project reads folder and rename the files to match the `<sample_id>_<R[12]>.gz` pattern if there is only one file per read end. If there are two files per read end, merge the files with the given script. + +``` +cd $READS_DIR/$project_id +python $SCRIPTS/merge_and_rename_NGI_fastq_files.py file1_R1.gz:file2_R1.gz <sample_name> 1 . +python $SCRIPTS/merge_and_rename_NGI_fastq_files.py file1_R2.gz:file2_R2.gz <sample_name> 2 . +``` + +Remove the original files. + 3. In the params folder, create the symlinks to the reads and the bcbio configuration files. If specifying a common sample suffix, ensure it includes any joining characters, e.g. “-“ or “_â€, so that the family identifier can be cleanly separated from the suffix. Get the number of families from the batch. *Edinburgh Genomics data* -- GitLab