A Snakemake workflow for (reproducibly) creating a QuantSeq 3' mRNA testing dataset that is both small enough to run in standard continuous integration testing environments, and large enough to produce (some reasonably) meaningful results.
This workflow is based on data presented and analyzed here:
Corley, S.M., Troy, N.M., Bosco, A. et al. QuantSeq. 3′ Sequencing combined with Salmon provides a fast, reliable approach for high throughput RNA expression analysis. Sci Rep 9, 18895 (2019). https://doi.org/10.1038/s41598-019-55434-x
The full data can be found here:
https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA509074&o=acc_s%3Aa
We use the QuantSeq data, which should be the samples with LibraryLayout
SINGLE
and AvgSpotLen
75
according to the methods section:
https://www.nature.com/articles/s41598-019-55434-x#Sec10
Thus, the total samples selected are:
The study identifies UROSEVIC_RESPONSE_TO_IMIQUIMOD
as the most significant differentially regulated gene set affected by their polyI:C
treatment:
https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/UROSEVIC_RESPONSE_TO_IMIQUIMOD
We thus restrict the raw data to reads mapping to the contained genes in order to drastically reduce data set size while hopefully maintaining some kind of useful result.
In addition, we also add in the KEGG_PROTEASOME
which is not expected to be detected as a differentially expressed gene set in the QuantSeq data:
https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KEGG_PROTEASOME
For reference, Figure 7 of the original manuscript gives the most important results of the gene set enrichment analysis: https://www.nature.com/articles/s41598-019-55434-x/figures/7
The MSigDB gene sets are used according to their Creative Commons Attribution 4.0 International License, which is given here: https://www.gsea-msigdb.org/gsea/msigdb_license_terms.jsp
The usage of this workflow is described in the Snakemake Workflow Catalog.
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) create-quant-seq-testing-datasetsitory and its DOI (see above).