Overview

This repository provides a Snakemake pipeline for generating the target files for use in the scUTRquant pipeline. This is provided as a record of how we generated truncated transcriptomes for the scUTRquant manuscript and an example of how to use the Bioconductor package txcutr.

The accompanying manuscript is openly available at:

Fansler, M.M., Mitschka, S. & Mayr, C. Quantifying 3′UTR length from scRNA-seq data reveals changes independent of gene expression. Nat Commun 15, 4050 (2024). https://doi.org/10.1038/s41467-024-48254-9

Please note that, while the pipeline does provide some flexibility, it was implemented with the limited scope of mm10 and hg38 annotations from Ensembl and GENCODE. For example, it must be modified in ordered to generate correct FASTA files for mm39 or hg38 references.

Setup

Prerequisites

Snakemake >= 5.11
Conda/Mamba
(optional) CellRanger

This should be compatible with Linux and MacOS systems. If Conda is not already installed, we recommend installing Miniforge.

Installation

git clone https://github.com/Mayrlab/txcutr-db.git

Configuration

Please edit the config.yaml file to provide a tmpdir specific to your system. If you wish to use the GTF filtering provided by CellRanger, also specify the path to CellRanger for your system.

Usage

The rule all: in the Snakefile contains specifications for several variants that were used in the scUTRquant manuscript. One likely does not want to generate all of these. Instead, a single variant can be "requested" at the commandline. Since the kallisto index (.kdx file) is the last output, that is what should be specified:

snakemake --use-conda homo_sapiens/gencode.v38.annotation.pc.txcutr.w500.kdx

This would use the GENCODE v38 annotation, filtered for only protein-coding transcripts (.pc) with validated 3' ends, and truncated to 500 nts (.w500). The default merge table (TSV) will use a 200 nt merge distance.

Notes

The txcutr step is computationally demanding. For example, in an HPC setting, we have it configured to run with 20 cores and 4 GB/core, which takes about 30 mins.

Be aware that some rules include thread and resources specifications that are used by Snakemake cluster profiles. Please adjust accordingly (e.g., not all cluster configurations interpret the mem_mb parameter as per core)!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
envs		envs
metadata		metadata
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Setup

Prerequisites

Installation

Configuration

Usage

Notes

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

Mayrlab/txcutr-db

Folders and files

Latest commit

History

Repository files navigation

Overview

Setup

Prerequisites

Installation

Configuration

Usage

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages