Skip to content
Snippets Groups Projects
Murray Wham's avatar
mwham authored
Adding conda directove to encryption step. Defaulting publish mode to symlink - makes sense for large uploads, and FTP/Aspera can deal with symlinks anyway
31eedaa0
History

EGA Submission via Portal

Supports encryption and upload of paired-end sequencing data to the European Genome/Phenome Archive (EGA) using the Submitter Portal CSV format for Run objects.

Introduction

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It is based on the nf-core template and only supports Conda.

Resources

The EGA-Cryptor JAR file is from ega-archive.org and stored at /modules/local/ega/encrypt/resources.

wget https://ega-archive.org/files/EgaCryptor.zip
unzip EgaCryptor.zip
rm EgaCryptor.zip

Running

The CSV file used to upload sample metadata to EGA must be provided. It links the internal EGA sample alias to its name. This pipeline assumes that the FASTQ files for upload are named in the format sample_R1.fastq.gz, sample_R2.fastq.gz, where sample is the entry in the subjectId field of the sample CSV file.

To run and upload automatically:

nextflow https://git.ecdf.ed.ac.uk/igmmbioinformatics/ega-submission-via-portal \
  -profile conda \
  --reads '*_R{1,2}.fastq.gz' \
  --samples /absolute/path/to/samples.csv \
  --outdir output \
  --ega_user ega-box-1234 \
  --egapass /absolute/path/to/egapass

To encrypt and produce a runs.csv file without uploading:

nextflow run ameynert/ega-submission-via-portal \
  -profile conda \
  --reads '*_R{1,2}.fastq.gz' \
  --samples /absolute/path/to/samples.csv \
  --outdir output

The CSV file for connecting uploaded paired-end FASTQ files to their sample aliases in the EGA Submitter Portal will be in the specified output folder as runs.csv.

Credits

Alison Meynert (alison.meynert@ed.ac.uk) Murray Wham (murray.wham@ed.ac.uk)