Transcription Run-On Grants Detection Of Regulatory elements (TROGDOR).
https://www.youtube.com/watch?v=90X5NJleYJQ
TROGDOR identifies transcription initiation regions (TIRs) from stranded nascent RNA sequencing data (GRO-seq, PRO-seq, ChRO-seq, mNET-seq, etc.). It uses a 1D U-Net model and a tiled image segmentation approach to achieve SOTA performance at predicting TIRs while maintaining computational efficiency.
We recommend installing inside an isolated Python environment. With uv (fastest):
uv tool install trogdorOr with pip inside a conda/venv environment:
pip install trogdorOr install the latest development version directly from GitHub:
pip install git+https://github.com/adamyhe/TROGDOR.git
# or with uv:
uv tool install git+https://github.com/adamyhe/TROGDOR.gitRun the full pipeline with a single command:
trogdor pipeline -p plus.bw -m minus.bw -o mysample.peaks.bed.gzThis writes one output file:
| File | Description |
|---|---|
mysample.peaks.bed.gz |
Called TIR peak regions (bgzipped BED) |
The intermediate probability bigWig is written to a temporary file and deleted automatically.
Inputs: plus- and minus-strand bigWig files from a nascent RNA sequencing experiment. TROGDOR was trained on PRO/GRO-seq data, and has been vetted on data from GRO/PRO/ChRO/mNET-seq experiments. These files should represent coverage tracks of the 3' ends of reads/fragments (that is, the most recent nucleotide added by the polymerase), ideally in raw counts. The minus strand data can be stored as either positive or negative.
GPU: scoring uses a 1D U-Net model implemented in plain PyTorch and thus can be greatly accelerated by running on a CUDA-capable GPU (particularly Ampere or newer architectures that support bf16). Apple Silicon MPS (-d mps) should also work but has not been tested. Pass -d cpu to run on CPU (much slower). If CUDA is unavailable, the tool automatically falls back to MPS (if detected) or CPU. Inference uses a streaming pipeline: bigWig IO for the next chromosome runs in a background thread while the GPU processes the current one, and chunks are fed to the GPU via a DataLoader with pin_memory for async CPU→GPU transfer.
Pretrained model: downloaded automatically from HuggingFace Hub on first run and cached locally. To use a custom model, pass -M /path/to/model.torch.
| Flag | Default | Description |
|---|---|---|
-d / --device |
cuda |
PyTorch device (cuda, cpu, cuda:1, …) |
-s / --min_score |
0.95 |
Minimum score threshold; bins below this are not reported |
-b / --save_bigwig |
off | Save the intermediate probability bigWig to this path (pipeline only) |
--chroms |
all | Score only specific chromosomes (e.g. --chroms chr1 chr2) |
--num_workers |
0 |
DataLoader workers for chunk preprocessing (set to 1–4 on Linux/CUDA) |
-v / --verbose |
off | Print progress messages |
The pipeline can also be run as two separate steps — useful if you want to call peaks at multiple thresholds without re-scoring:
# Step 1: score (GPU recommended)
trogdor score -p plus.bw -m minus.bw -o mysample.prob0.9.bw -s 0.9
# Step 2: call peaks at different thresholds (CPU, fast)
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.9.bed.gz -s 0.9
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.95.bed.gz -s 0.95
trogdor peaks -i mysample.prob0.9.bw -o mysample.peaks0.99.bed.gz -s 0.99The default peaks command preserves the original threshold-and-merge caller.
An experimental refined caller is available for benchmarking post-processing
choices without changing the model or default behavior:
trogdor peaks -i mysample.prob.bw -o mysample.refined.bed \
--mode refined --min_score 0.95 --max_gap 32 --min_width 32Refined output includes BED columns for the merged peak and the max-score
summit bin: chrom, start, end, score, summit_start, summit_end,
summit_score. --min_support_signal can optionally require raw plus/minus
coverage support when --support_plus_bigwig and --support_minus_bigwig are
provided.
The fdr subcommand estimates the score threshold corresponding to a target empirical FDR from a probability bigWig and a ground truth peak set (e.g. ENCODE PLS/ELS or PRO-cap peaks for your cell type of interest). This can be useful for deciding what min_score threshold you should use (although the default 0.95 has worked well for me).
# Step 1: generate a dense score bigWig (report ALL values)
trogdor score -p plus.bw -m minus.bw -o mysample.prob.bw --min_score 0
# Step 2: calculate empirical FDR against a candidate set of "ground truth" peaks
trogdor fdr -b mysample.prob.bw -t candidate_peaks.bed.gz --fdr_target 0.05Strategy: each candidate peak is summarised by its max (or mean) bigWig score. A null distribution is built by shuffling peak positions uniformly within chromosome bounds (preserving widths). FDR at threshold t is estimated as min(1, N_null(t) / N_real(t)), averaged over --n_shuffle independent shuffles. The score threshold at the target FDR is printed to stdout.
| Flag | Default | Description |
|---|---|---|
-b / --bigwig |
— | Probability bigWig (required) |
-t / --peaks |
— | Candidate peak BED (required) |
--stat |
max |
Summary statistic per peak (max or mean) |
--n_shuffle |
1 |
Independent genome shuffles to average the null over |
--fdr_target |
0.05 |
Target FDR for reporting the score threshold |
--output |
off | Write TSV table of threshold/FDR/N_real/N_null to path |
--figure |
off | Save FDR-vs-threshold plot to path |
--chroms |
all | Restrict to specific chromosomes |
Install needed UCSC tools:
mamba create trogdor
mamba activate trogdor
mamba install -c bioconda ucsc-liftoverInstall TROGDOR with dev dependencies:
git clone git@github.com:adamyhe/TROGDOR.git
cd TROGDOR
pip install -e ".[dev]"Most users do not need to retrain — a pre-trained model is bundled with the
package and used automatically by the CLI. See scripts/README.md
for data download, training, and benchmarking instructions of the original TROGDOR model.
I haven't included general scripts for retraining on custom datasets, but these should
be a useful starting point.