feat(rpbp): add 7 rpbp modules + fasta_gtf_bam_rpbp subworkflow by pinin4fjords · Pull Request #11695 · nf-core/modules

pinin4fjords · 2026-05-19T12:29:15Z

Adds 7 rpbp/* modules and the fasta_gtf_bam_rpbp subworkflow that wraps Rp-Bp's translated-ORF caller chain (Malone et al. 2017, doi:10.1093/nar/gkw1141). Ported from nf-core/riboseq#174.

Design — bypass the umbrella

Rp-Bp's prepare-rpbp-genome bundles three independent steps: bowtie2-build (rRNA index), STAR --runMode genomeGenerate (alignment index), and get_orfs (the BED prep used by all downstream tools). The 6 per-sample tools we wrap don't consume the bowtie or STAR indices — upstream alignment is supplied as the BAM. rpbp/preparegenome therefore invokes get_orfs directly via a small Python wrapper, skipping the unused index builds entirely. Real runs take ~3 minutes on chr20 instead of the multi-minute STAR build the umbrella triggers.

The per-sample chain similarly avoids predict-translated-orfs and create-orf-profiles, both of which internally call flexbar + bowtie + STAR on raw FASTQs (re-doing the upstream alignment). Splitting the 6 internal steps into separate modules also gives independent caching on resume.

Modules

rpbp/preparegenome — chains gtf-to-bed12 → extract-bed-sequences → extract-orf-coordinates → split-bed12-blocks → label-orfs via the internal get_orfs function.
rpbp/extractmetageneprofiles — per-read-length metagene profiles around annotated start codons.
rpbp/estimatemetagenebayesfactors — Bayes factors comparing periodic vs non-periodic models per read length.
rpbp/selectperiodicoffsets — pick one P-site offset per high-quality read length.
rpbp/extractorfprofiles — applies the metagene-length filter (replicating ribo_utils.utils.get_periodic_lengths_and_offsets) and runs extract-orf-profiles. Filter thresholds via ext.args2 (4 space-separated tokens; defaults mirror rpbp.defaults.metagene_options).
rpbp/estimateorfbayesfactors — Bayesian translated-vs-untranslated model per ORF.
rpbp/selectfinalpredictionset — apply BF/length/overlap rules and emit the final predicted-ORF BED, DNA FASTA, protein FASTA.

Subworkflow

fasta_gtf_bam_rpbp — end-to-end run: rpbp/preparegenome (once per invocation) followed by the per-sample 6-step chain (extract-metagene → metagene BF → select offsets → extract-orf-profiles → orf BF → select final).

Container

All rpbp/* modules share a Wave-built container co-installing bioconda::rpbp=4.0.1 and bioconda::star=2.7.11b:

docker: community.wave.seqera.io/library/rpbp_star:247a8ae84a6babfb
singularity: https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/3a/3a8aa95ce76934f6269b2d8cbdd3d57c13db029c704152975b2315e35b7a2154/data

Versions emitted via topic: versions.

Test plan

Real + stub tests pass for each of the 7 modules and the subworkflow under nf-core modules test --profile docker / nf-core subworkflows test --profile docker. Test fixtures (chr20) live in nf-core/test-datasets:modules under genomics/homo_sapiens/riboseq_expression/ — same chr20 FASTA / GTF / BAMs the other riboseq modules use.

The 3 modules whose chain extends through extract-orf-profiles (extractorfprofiles, estimateorfbayesfactors, selectfinalpredictionset) and the subworkflow itself carry a tests/nextflow.config setting ext.args2 = '10 1 None 0.0' so chr20's modest per-length read counts survive the metagene filter and produce non-empty profiles.

Source: nf-core/riboseq#174.

…_rpbp subworkflows Adds the Rp-Bp Ribo-seq ORF caller (Malone et al. 2017, doi:10.1093/nar/gkw1141) as 8 split modules plus 2 orchestration subworkflows ported from nf-core/riboseq#174. Splitting per-tool (rather than wrapping rpbp's `predict-translated-orfs` umbrella command) gives independent caching on resume and lets the pipeline's own STAR alignment run instead of being re-done inside rpbp. Modules: - rpbp/buildconfig: render the Rp-Bp YAML config from pipeline-supplied fasta+gtf. - rpbp/preparegenome: build the Rp-Bp genome index (STAR index, ribosomal index, ORF BEDs). - rpbp/extractmetageneprofiles: per-read-length metagene profiles around starts. - rpbp/estimatemetagenebayesfactors: periodicity Bayes factors per read length. - rpbp/selectperiodicoffsets: pick a single P-site offset per high-quality length. - rpbp/extractorfprofiles: per-ORF P-site profile matrix using selected offsets. - rpbp/estimateorfbayesfactors: Bayesian translated-vs-untranslated model. - rpbp/selectfinalpredictionset: filter to the final predicted-ORF BED + FASTAs. Subworkflows: - bam_rpbp_predictorfs: per-sample 6-step chain on a Ribo-seq BAM with the cohort-shared annotation outputs supplied by the caller. - fasta_gtf_bam_rpbp: top-level end-to-end run; renders config, prepares the index once, then runs the per-sample chain. Containers: Wave-built `community.wave.seqera.io/library/rpbp_star:247a8ae84a6babfb` (co-installs `bioconda::rpbp=4.0.1` + `bioconda::star=2.7.11b`) for all rpbp tools; `quay.io/biocontainers/coreutils:9.5` for rpbp/buildconfig (config templating only). Versions are emitted via the topic-channel pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…across the board Bypass `prepare-rpbp-genome` umbrella; call rpbp's internal `get_orfs` function directly via a small inline Python wrapper. Skips the umbrella script's bowtie2-build (rRNA index) and STAR genome-generate steps entirely - none of the downstream Rp-Bp tools we wrap consume those indices, and upstream alignment is supplied as the BAM. rpbp/buildconfig removed - the config dict is now built in-memory inside rpbp/preparegenome from the input fasta + gtf paths. fasta_gtf_bam_rpbp loses the corresponding setup step. extractorfprofiles replicates rpbp's metagene-length filter (`ribo_utils.utils.get_periodic_lengths_and_offsets`) inline before calling extract-orf-profiles, so the upstream select-periodic-offsets output can drive --lengths/--offsets without going through rpbp's filename-driven config plumbing. Filter thresholds are exposed via ext.args2 (4 space-separated tokens, defaults match rpbp.defaults.metagene_options). All 7 inner-module tests now run the upstream chain on real chr20 data (no stub-only tests). Subworkflow tests cover bam_rpbp_predictorfs and fasta_gtf_bam_rpbp end-to-end. The 3 inner modules that need the lower threshold (extractorfprofiles / estimateorfbayesfactors / selectfinalpredictionset) carry tests/nextflow.config setting ext.args2 = '10 1 None 0.0' so the chr20 BAM produces non-empty profiles. Wave container unchanged (community.wave.seqera.io/library/rpbp_star:247a8ae84a6babfb). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The per-sample 6-step chain was only ever wrapped by fasta_gtf_bam_rpbp; keeping it as a standalone subworkflow added a thin layer no realistic caller needs (rpbp's BED outputs only come from preparegenome). Inlined into fasta_gtf_bam_rpbp — now one subworkflow with 7 process invocations end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…m/nf-core/modules into rpbp-add-modules-and-subworkflows

…nnel: column Cleans up the file: - Drops the multi-line header narrative recapping design choices and step ordering. - Drops the numbered "1. ... 2. ... 3. ..." inline narrative blocks. - Aligns the `// channel:` annotations in the emit block (12 of 13 lines were one column off the longest emit's reference position). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the size/xl label May 19, 2026

pinin4fjords changed the title ~~feat(rpbp): add 8 rpbp modules + bam_rpbp_predictorfs / fasta_gtf_bam_rpbp subworkflows~~ feat(rpbp): add 7 rpbp modules + bam_rpbp_predictorfs / fasta_gtf_bam_rpbp subworkflows May 19, 2026

pinin4fjords and others added 3 commits May 19, 2026 17:57

Merge branch 'master' into rpbp-add-modules-and-subworkflows

cce7bc2

Merge branch 'rpbp-add-modules-and-subworkflows' of https://github.co…

60401cd

…m/nf-core/modules into rpbp-add-modules-and-subworkflows

pinin4fjords changed the title ~~feat(rpbp): add 7 rpbp modules + bam_rpbp_predictorfs / fasta_gtf_bam_rpbp subworkflows~~ feat(rpbp): add 7 rpbp modules + fasta_gtf_bam_rpbp subworkflow May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rpbp): add 7 rpbp modules + fasta_gtf_bam_rpbp subworkflow#11695

feat(rpbp): add 7 rpbp modules + fasta_gtf_bam_rpbp subworkflow#11695
pinin4fjords wants to merge 6 commits into
masterfrom
rpbp-add-modules-and-subworkflows

pinin4fjords commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pinin4fjords commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design — bypass the umbrella

Modules

Subworkflow

Container

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pinin4fjords commented May 19, 2026 •

edited

Loading