Skip to content

docs: Refactor usage docs into user-centric multi-page structure#1742

Draft
adamrtalbot wants to merge 5 commits intonf-core:devfrom
adamrtalbot:docs/use-focused-documentation-refa
Draft

docs: Refactor usage docs into user-centric multi-page structure#1742
adamrtalbot wants to merge 5 commits intonf-core:devfrom
adamrtalbot:docs/use-focused-documentation-refa

Conversation

@adamrtalbot
Copy link
Copy Markdown
Contributor

Summary

Restructures the monolithic docs/usage.md (831 lines) into a Quick Start landing page with 6 focused subpages, organised by user intent rather than pipeline code structure.

Closes #1737

Changes

New structure

docs/
├── usage.md                              → Quick Start (minimal working example, signposts)
├── usage/
│   ├── samplesheet.md                    → Samplesheet format (examples-first)
│   ├── reference-genomes.md              → Genome configuration
│   ├── alignment-and-quantification.md   → Analysis strategy (decision tree)
│   ├── preprocessing.md                  → Trimming, rRNA removal, contamination
│   ├── advanced-features.md              → UMIs, prokaryotic, 3'DGE, GPU
│   ├── configuration.md                  → Nextflow profiles, resources, custom config
│   └── differential_expression_analysis/
│       ├── introduction.md               → Brief intro + prerequisites
│       ├── running-the-pipeline.md       → Run rnaseq for DE (replaces theory.md + rnaseq.md)
│       └── de-analysis-in-r.md           → Hands-on DESeq2 (replaces de_rstudio.md + interpretation.md)
├── output.md                             → UNCHANGED
└── README.md                             → Updated index

Key design principles

  • Examples before definitions — every page opens with a working example
  • Decision trees over lists — "Which aligner?" not "Here are 5 aligners"
  • Progressive disclosure — Quick Start → common config → advanced → reference
  • One concern per page — each page answers one user question

DE tutorial refactoring

  • Consolidated from 5 pages to 3
  • Removed Gitpod dependency (works with any R/RStudio setup)
  • Removed iGenomes recommendation (aligned with main docs guidance)
  • Updated to generic paths instead of /workspace/gitpod/...
  • Preserved all R code and analysis steps

Writing style

Applied nf-core writing style throughout:

  • British English
  • Active voice, no gerunds in headings
  • Removed all "please", "e.g.", "i.e.", "etc."
  • Sentence case headings

Content verification

  • All 35+ original sections mapped to new files (nothing lost)
  • 33 internal cross-references verified
  • output.md unchanged
  • Net reduction: 1456 insertions, 1815 deletions

nf-core format compatibility

Additional pages in docs/usage/ are explicitly supported by nf-core guidelines:

"Additional pages (e.g. tutorials, FAQs) can be added and will be automatically rendered on the nf-core website pipeline page."

This pattern is already used by nf-core/sarek (docs/usage/variantcalling/), nf-core/taxprofiler (docs/usage/tutorials.md), and this pipeline's existing DE tutorial.

Restructure the monolithic usage.md (831 lines) into a Quick Start
landing page with 6 focused subpages organised by user intent:

- samplesheet.md: input format, examples-first
- reference-genomes.md: genome configuration
- alignment-and-quantification.md: analysis strategy with decision tree
- preprocessing.md: trimming, rRNA removal, contamination screening
- advanced-features.md: UMIs, prokaryotic, 3'DGE, GPU acceleration
- configuration.md: Nextflow profiles, resources, custom config

Also refactors the DE tutorial from 5 Gitpod-dependent pages into 3
self-guided pages (introduction, running the pipeline, DE in R).

Applies nf-core writing style throughout: British English, active voice,
no gerunds in headings, no please/e.g./i.e.

output.md is unchanged.

Closes nf-core#1737

Generated by Claude Code
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@adamrtalbot adamrtalbot changed the base branch from master to dev March 4, 2026 15:50
- usage.md: use descriptive filenames, add reference genome guidance, add check results section
- output.md: add key outputs summary table at top, fix style violations (please, e.g., i.e.)
- samplesheet.md: fix e.g. to such as, fix behavior to behaviour (British English)
- configuration.md: fix 2 broken anchor links, fix params.yaml iGenomes example, rewrite iGenomes note
- alignment-and-quantification.md: remove duplicated HISAT2 content from quantification section
- de-analysis-in-r.md: fix DESeq2 accessor bugs (dds$counts, res$gene, resSig$gene)

Generated by Claude Code
Fix 27 issues across 7 documentation files:
- Fix 4 broken links (RSeQC, TOC, cross-page anchor, internal anchor)
- Fix 9 typos in output.md (gauge, transcripts, abundances, etc.)
- Convert GitHub-flavored alerts to nf-core admonitions
- Convert legacy > **NB:** notes to :::note (outside details blocks)
- Move :::tip outside <details> block for correct rendering
- Fix hardcoded column name in DE tutorial R code
- Fix hedging language and capitalisation per nf-core style

Generated by Claude Code
The customised docs/README.md lists the new multi-page usage
documentation structure and intentionally differs from the template.

Generated by Claude Code
Revert customised README.md to the template version. The nf-core
website auto-discovers subpages via frontmatter order, so the index
page only needs to link usage.md and output.md.

Generated by Claude Code
Comment on lines +81 to +129
## Strandedness prediction

If you set the strandedness value to `auto`, the pipeline will sub-sample the input FastQ files to 1 million reads, use Salmon Quant to automatically infer the strandedness, and then propagate this information through the rest of the pipeline. This behaviour is controlled by the `--stranded_threshold` and `--unstranded_threshold` parameters, which are set to 0.8 and 0.1 by default, respectively. This means:

- **Forward stranded:** At least 80% of the fragments are in the 'forward' orientation.
- **Unstranded:** The forward and reverse fractions differ by less than 10%.
- **Undetermined:** Samples that do not meet either criterion, possibly indicating issues such as genomic DNA contamination.

:::note
These thresholds apply to both the strandedness inferred from Salmon outputs for input to the pipeline and how strandedness is inferred from RSeQC results using pipeline outputs.
:::

### Usage examples

1. **Forward Stranded Sample:**
- Forward fraction: 0.85
- Reverse fraction: 0.15
- **Classification:** Forward stranded

2. **Reverse Stranded Sample:**
- Forward fraction: 0.1
- Reverse fraction: 0.9
- **Classification:** Reverse stranded

3. **Unstranded Sample:**
- Forward fraction: 0.45
- Reverse fraction: 0.55
- **Classification:** Unstranded

4. **Undetermined Sample:**
- Forward fraction: 0.6
- Reverse fraction: 0.4
- **Classification:** Undetermined

You can control the stringency of this behaviour with `--stranded_threshold` and `--unstranded_threshold`.

### Errors and reporting

The results of strandedness inference are displayed in the MultiQC report under 'Strandedness Checks'. This shows any provided strandedness and the results inferred by both Salmon (when strandedness is set to 'auto') and RSeQC. Mismatches between input strandedness (explicitly provided by the user or inferred by Salmon) and output strandedness from RSeQC are marked as fails. For example, if a user specifies 'forward' as strandedness for a library that is actually reverse stranded, this is marked as a fail.

![MultiQC - Strand check table](../images/mqc_strand_check.png)

Be sure to check the strandedness report when reviewing the QC for your samples.

## Linting

By default, the pipeline will run [fq lint](https://github.com/stjude-rust-labs/fq) on all input FASTQ files, both at the start of preprocessing and after each preprocessing step that manipulates FASTQ files. If errors are found, an error will be reported and the workflow will stop.

The `extra_fqlint_args` parameter can be manipulated to disable [any validator](https://github.com/stjude-rust-labs/fq?tab=readme-ov-file#validators) from `fq` you wish. For example, we have found that checks on the names of paired reads are prone to failure, so that check is disabled by default (setting `extra_fqlint_args` to `--disable-validator P001`).
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be migrated to a specific section away from the general usage docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reorganise usage docs with a user-centric structure

1 participant