This repository provides a Snakemake pipeline for generating the target files for use in
the scUTRquant pipeline. This is provided as a record
of how we generated truncated transcriptomes for the scUTRquant manuscript and an example of
how to use the Bioconductor package txcutr.
The accompanying manuscript is openly available at:
Fansler, M.M., Mitschka, S. & Mayr, C. Quantifying 3′UTR length from scRNA-seq data reveals changes independent of gene expression. Nat Commun 15, 4050 (2024). https://doi.org/10.1038/s41467-024-48254-9
Please note that, while the pipeline does provide some flexibility, it was implemented with the limited
scope of mm10 and hg38 annotations from Ensembl and GENCODE. For example, it must be modified
in ordered to generate correct FASTA files for mm39 or hg38 references.
- Snakemake >= 5.11
- Conda/Mamba
- (optional) CellRanger
This should be compatible with Linux and MacOS systems. If Conda is not already installed, we recommend installing Miniforge.
git clone https://github.com/Mayrlab/txcutr-db.gitPlease edit the config.yaml file to provide a tmpdir specific to your system. If you wish
to use the GTF filtering provided by CellRanger, also specify the path to CellRanger for your system.
The rule all: in the Snakefile contains specifications for several variants that were used in the scUTRquant
manuscript. One likely does not want to generate all of these. Instead, a single variant can be "requested" at
the commandline. Since the kallisto index (.kdx file) is the last output, that is what should be specified:
snakemake --use-conda homo_sapiens/gencode.v38.annotation.pc.txcutr.w500.kdxThis would use the GENCODE v38 annotation, filtered for only protein-coding transcripts (.pc) with validated
3' ends, and truncated to 500 nts (.w500). The default merge table (TSV) will use a 200 nt merge distance.
The txcutr step is computationally demanding. For example, in an HPC setting, we have it configured to
run with 20 cores and 4 GB/core, which takes about 30 mins.
Be aware that some rules include thread and resources specifications that are used by Snakemake cluster
profiles. Please adjust accordingly (e.g., not all cluster configurations interpret the mem_mb parameter
as per core)!