Skip to content
Snippets Groups Projects
Commit e99a16ef authored by ameyner2's avatar ameyner2
Browse files

Delete SOP_archiving.md

parent de105214
No related branches found
No related tags found
1 merge request!3Ultra 2 SOP/doc updates
# Standard operating procedure - Archiving of trio whole exome samples at Edinburgh Parallel Computing Centre
This SOP applies to the archiving of files generated by the alignment, variant calling, and variant prioritization analysis pipeline for trio whole exome samples at the Edinburgh Parallel Computing Centre (EPCC). Scripts are version controlled on the University of Edinburgh gitlab server gitlab.ecdf.ed.ac.uk/igmmbioinformatics/trio-whole-exome. Request access by e-mail: alison.meynert@igmm.ed.ac.uk.
## User requirements
The user will need an account on the EPCC Ultra2 system (`sdf-cs1.epcc.ed.ac.uk`). Contact Donald Scobbie (d.scobbie@eppc.ed.ac.uk) for any issues.
## Software requirements
Htslib 1.10.2 and Samtools 1.10 are installed at `/home/u035/u035/shared/software/bcbio/tools/bin`.
## Data requirements
A copy of the human reference genome hg38 is at `/home/u035/u035/shared/software/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa`.
## Definitions
In this document, `N` is the total number of samples in the batch, and `X` is the number of families.
Text in angle brackets, e.g. `<batch>` indicates variable parameters. A variable parameter such as `<family1-X>` indicates that there are X instances of the parameter, each with their own unique value.
## Notes
1. In all steps below, when Slurm log files are no longer needed, they can be deleted from `logs`. This “clean desk” policy makes it easier to see what current jobs are running and which log files require examination.
2. The tracking file is maintained on the IGMM datastore at `/exports/igmm/datastore/IGMM-VariantAnalysis/trio_whole_exome/Batch_status.xlsx.`
## Procedure
1. Log in to the EPCC Ultra2 system (`sdf-cs1.epcc.ed.ac.uk`). Set the project id environment variable and other general configuration environment variables, and calculate the number of families in the batch. Change to the logs directory – all jobs are submitted from here. See [SOP alignment and variant calling](SOP_alignment_variant_annotation.md) for details on the `<project_id>`, `<short_project_id>`, and `<version>` parameters.
```
ssh user@ultra2.epcc.ed.ac.uk
project_id=<EdGE_project_id>
short_project_id=`echo $project_id | cut -f 1 -d '_'`
version=<version>.
source /home/u035/u035/shared/scripts/trio_whole_exome_config.sh
X=`wc -l $PARAMS_DIR/$project_id.family_ids.txt | awk '{print $1}'`
```
2. Compress BAM files to CRAM and compare the two files. The output log files should be empty and the files <sample>.cram, <sample>.cram.crai, and <sample>.cram.flagstat.txt should be present for each sample.
```
cd $LOGS_DIR
sbatch --export=PROJECT_ID=$project_id,VERSION=$version,CONFIG_SH=$SCRIPTS/trio_whole_exome_config.sh \
--array=1-$X $SCRIPTS/submit_trio_wes_cram_compression.sh
```
3. Calculate md5 checksums on the per-family files, excluding the BAM files. Creates the file `md5sum.txt` at the root of each family’s output directory. Check the files with the calculated md5sums. They should total 30 lines per sample plus 26 lines per family. The log files should be empty.
```
cd $LOGS_DIR
sbatch --export=PROJECT_ID=$project_id,VERSION=$version,CONFIG_SH=$SCRIPTS/trio_whole_exome_config.sh \
--array=1-$X $SCRIPTS/submit_trio_wes_family_checksums.sh
cd $OUTPUT_DIR/${short_project_id}_${version}/families
wc -l */md5sum.txt
```
4. Move the remaining project parameter and PED files into the results directory, including the original PED file if a copy was made.
```
cd $OUTPUT_DIR/${short_project_id}_${version}
mv $PARAMS_DIR/${project_id}* params/
mv $PARAMS_DIR/${ped_file} params/
```
5. Calculate md5 checksums on the remaining project files, excluding the `families` sub-directory. Creates the file `md5sum.txt` at the root of the project output directory.
```
sbatch --export=PROJECT_ID=$project_id,VERSION=$version,CONFIG_SH=$SCRIPTS/trio_whole_exome_config.sh \
$SCRIPTS/submit_trio_wes_project_checksums.sh
```
6. Remove the BAM files from the results.
```
cd $OUTPUT_DIR/${short_project_id}_${version}
rm families/*/*.bam*
```
7. Clean up the logs directory.
```
cd $LOGS_DIR
rm *
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment