WP5 RNA-seq vs. RNA-seq analysis
At the moment, we take wp5_rnaseq_gene_counts and report that in dispatch_priorities
as wp5_rnaseq
- but there could be more samples available that we haven't run the Salmon transcript quantification on. This should scan the available raw RNA-seq data and built up a table similar to wp5_rnaseq_gene_counts:
wp5_rnaseq contains fastq pairs in long form:
rnaseq_sample_id run_folder r1 r2 errors
CCP0001-R path/to/rnaseq_run CCP0001-R_l1_r1.fq.gz CCP0001-R_l1_r2.fq.gz false
CCP0001-R path/to/rnaseq_run CCP0001-R_l2_r1.fq.gz CCP0001-R_l2_r2.fq.gz false
(Plus clean kit, patient, etc.)
view_wp5_rnaseq groups everything into 1 line per sample:
rnaseq_sample_id run_id r1 r2 errors
CCP0001-R an_rnaseq_run_id CCP0001-R_l1_r1.fq.gz,CCP0001-R_l2_r1.fq.gz ... false
Then in dispatch_priorities
:
patient_id kit_id timepoint ... wp5_rnaseq wp5_rnaseq_gene_counts ...
Re-sequenced samples get processed as a new sample ID, so we don't need to worry about samples split across runs.