Skip to content

Dev improve docs #239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,9 +202,11 @@ The `conda` environments are expected by the `conda_local` profile of the pipeli

```sh
$ conda env create -n magma-env-1 --file magma-env-1.yml

$ conda env create -n magma-env-2 --file magma-env-2.yml
$ conda env create -n magma-tbprofiler-env --file magma-tbprofiler-env.yaml
$ conda env create -n magma-ntmprofiler-env --file magma-ntmprofiler-env.yaml
```
Optionally, you can run `bash ./conda/setup_conda_envs.sh` to build all the necessary conda environments.

Once the environments are created, you can make use of the pipeline parameter `conda_envs_location` to inform the pipeline of the names and location of the conda envs.

Expand Down Expand Up @@ -248,7 +250,7 @@ Success, would look like this

We provide [two docker containers](https://github.com/orgs/TORCH-Consortium/packages?repo_name=MAGMA) with the pipeline so that you could just download and run the pipeline with them. There is **NO** need to create any docker containers, just download and enable the `docker` profile.

> 🚧 **Container build script**: The script used to build these containers is provided [here](./containers/build.sh).
> 🚧 **Container build script**: The script used to build these containers is provided [here](https://github.com/TORCH-Consortium/MAGMA/tree/master/containers).

Although, you don't need to pull the containers manually, but should you need to, you could use the following commands to pull the pre-built and provided containers

Expand Down
4 changes: 3 additions & 1 deletion conda_envs/setup_conda_envs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ resolverCondaBinary="conda" # pick either conda OR mamba

$resolverCondaBinary env create -p magma-env-1 --file magma-env-1.yml

$resolverCondaBinary env create -p magma-env-1 --file magma-env-1.yml
$resolverCondaBinary env create -p magma-env-2 --file magma-env-2.yml

$resolverCondaBinary env create -p magma-ntmprofiler-env --file magma-ntmprofiler-env.yml

$resolverCondaBinary env create -p magma-tbprofiler-env --file magma-tbprofiler-env.yml

#===========================================================

#NOTE: Setup the tbprofiler env with WHO v2 Database
Expand Down
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/.quarto/
23 changes: 23 additions & 0 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
project:
type: website

website:
title: "MAGMA"

sidebar:
style: "docked"
search: true
contents:
- text: "Usage"
href: usage.qmd
- text: "Customizable Parameters"
href: customizable-parameters.qmd
- text: "Output"
href: output.qmd

format:
html:
theme: cosmo
css: styles.css
toc: true

Empty file added docs/cloud_batch_execution.qmd
Empty file.
88 changes: 88 additions & 0 deletions docs/customizable-parameters.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# MAGMA Customizable Parameters


This document provides an overview of the customizable parameters for the MAGMA pipeline. Each parameter is listed with its default value, description.

> 💡 **Hint**: you may check a full parameters [reference file](https://github.com/TORCH-Consortium/MAGMA/blob/master/params/params.yaml).

---

## Common Parameters

### Input Samplesheet
| Parameter | Default Value | Description |
|-----------------------|----------------------------|-------------------------------------------------------------------------------------------------|
| `input_samplesheet` | `"samplesheet.magma.csv"` | The input CSV file containing sample information. The study ID cannot start with `XBS_REF_`. |

> 💡 **Hint**: The samplesheet should include the fields `[Sample, R1, R2]`. Optionally, you can add `[study, library, attempt, flowcell, lane, index_sequence]`.

---

### Output Directory
| Parameter | Default Value | Description |
|-------------|-----------------------|-----------------------------------------------------------------------------|
| `outdir` | `"magma-results"` | The directory where all output files will be written. |
| `vcf_name` | `"joint"` | The name of the output folder for results. Used to derive `JOINT_NAME`. |

> 💡 **Note**: The `vcf_name` parameter is critical for naming conventions in downstream processes.

---

## Additional Samples Addition

| Parameter | Default Value | Description |
|-------------------|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
| `use_ref_gvcf` | `true` | Whether to use a reference GVCF file to include additional samples. |
| `ref_gvcf` | `"${projectDir}/resources/ref_gvcfs/LineagesAndOutgroupV2.g.vcf.gz"` | Path to the reference GVCF file. |
| `ref_gvcf_tbi` | `"${projectDir}/resources/ref_gvcfs/LineagesAndOutgroupV2.g.vcf.gz.tbi"` | Path to the index file for the reference GVCF. |

> 💡 **Hint**: Use this feature if your dataset has low genetic diversity (e.g., clonal or fewer than 20 samples).

---

## Quality Control Parameters

| Parameter | Default Value | Description |
|-------------------------------|---------------|-------------------------------------------------------------------------------------------------|
| `cutoff_median_coverage` | `10` | The minimal median coverage required to process the sample. |
| `cutoff_breadth_of_coverage` | `0.90` | The minimal breadth of coverage required to process the sample. |
| `cutoff_rel_abundance` | `0.70` | The minimal relative abundance of the majority strain required to process the sample. |
| `cutoff_ntm_fraction` | `0.20` | The maximum fraction of NTM DNA allowed to process the sample. |

> ⚠️ **Attention**: Ensure these values are adjusted based on the quality of your input data to avoid processing errors.

---

## Skipping Pipeline Steps

| Parameter | Default Value | Description |
|----------------------------------|---------------|-------------------------------------------------------------------------------------------------|
| `only_validate_fastqs` | `false` | Set to `true` to only validate input FASTQs and check their FASTQC reports. |
| `skip_merge_analysis` | `false` | Skip the final merge analysis step. |
| `skip_variant_recalibration` | `false` | Skip variant quality score recalibration (VQSR). |
| `skip_base_recalibration` | `true` | Skip base quality score recalibration (BQSR). Not suitable for low-coverage Mtb genomes. |
| `skip_minor_variants_gatk` | `true` | Skip minor variants detection with GATK. LoFreq is recommended for most purposes. |
| `skip_phylogeny_and_clustering` | `false` | Disable downstream phylogenetic analysis of merged GVCF. |
| `skip_complex_regions` | `false` | Disable downstream complex region analysis of merged GVCF. |
| `skip_ntmprofiler` | `false` | Disable execution of `ntmprofiler` on FASTQ files. |
| `skip_tbprofiler_fastq` | `true` | Disable `tbprofiler` analysis on FASTQ files. |
| `skip_spotyping` | `false` | Disable spoligotyping analysis. |

> 💡 **Hint**: Use these flags to customize the pipeline execution based on your specific requirements.

---

## Reference Files

| Parameter | Default Value | Description |
|-------------------------|------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
| `ref_fasta_basename` | `"NC-000962-3-H37Rv"` | Basename of the reference FASTA file. |
| `ref_fasta_dir` | `"${projectDir}/resources/genome"` | Directory containing the reference FASTA file. |
| `ref_fasta` | `"${params.ref_fasta_dir}/${params.ref_fasta_basename}.fa"`| Full path to the reference FASTA file. |
| `ref_fasta_dict` | `"${params.ref_fasta_dir}/${params.ref_fasta_basename}.dict"`| Path to the reference FASTA dictionary file. |
| `ref_fasta_gb` | `"${params.ref_fasta_dir}/${params.ref_fasta_basename}.gb"`| Path to the reference GenBank file. |

> ⚠️ **Warning**: It is recommended to use the provided reference files to ensure compatibility with the pipeline.

---

Empty file added docs/hpc_execution.qmd
Empty file.
169 changes: 169 additions & 0 deletions docs/output.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
title: "MAGMA"
---

Here we briefly introduce the main outputs from MAGMA pipeline execution, please note that some outputs are optional and depends mainly on specific parameters to be generated.

- [Interpretation](#Interpretation)

# Tutorials and Presentations

Tim Huepink and Lennert Verboven created an in-depth tutorial of the features of the variant calling in MAGMA:

[![Video](https://img.youtube.com/vi/Kic2ItrJHj0/maxresdefault.jpg)](https://www.youtube.com/watch?v=Kic2ItrJHj0)


We have also included a presentation (in PDF format) of the logic and workflow of the MAGMA pipeline as well as posters that have been presented at conferences. Please refer the [docs](./docs) folder.

# Interpretation

The results directory produced by MAGMA is as follows:

```bash
/path/to/results_dir/
.
├── QC_statistics
├── analyses
└── vcf_files
```

## QC Statistics Directory

In this directory you will find files related to the quality control carried out by the MAGMA pipeline. The structure is as follows:

```bash
/path/to/results_dir/QC_statistics
├── cohort
| └── fastq_validation
│   └── multiqc
│   └── multiqc_data
└── per_sample
├── coverage
├── fastqc
└── mapping

```

- **cohort**

Here you will find the `joint.merged_cohort_stats.tsv` which contains the QC statistics for all samples in the samplesheet and allows users to determine why certain samples failed to be incorporated in the cohort analysis steps

In addition, you'll find the cohort-level MultiQC report generated by `per_sample/fastqc` analysis and the fastq validation report in `json` format.

- **per_sample/coverage**

Contains the GATK WGSMetrics outputs for each of the samples in the samplesheet

- **per_sample/mapping**

> Contains the FlagStat and samtools stats for each of the samples in the samplesheet

## Analysis Directory

```bash
/path/to/results_dir/analysis
├── cluster_analysis
├── drug_resistance
├── non-tuberculous_mycobacteria
├── phylogeny
├── spotyping
└── snp_distances
```

- **Cluster Analysis**

> Contains files related to clustering based on 5SNP and 12SNP cutoffs and inclunding and excluding complex regions
> **.figtree files**: These can be imported directly into Figtree for visualisation

- **Drug Resistance**

Organised based on the different types of variants as well as combined results:

```bash
/path/to/results_dir/analysis/drug_resistance
├── combined_resistance_summaries
├── combined_resistance_summaries_mixed_infection_samples
├── major_variants_xbs
├── minor_variants_lofreq
├── structural_variants_delly
└── tbprofiler_fastq
```

Each of the directories containing results related to the different variants (major | minor | structural) have text files that can be used to annotate the .treefiles produced by MAGMA in iToL (https://itol.embl.de)

The combined resistance results file contains a per-sample drug resistance summary based on the WHO Catalogue of *Mtb* mutations (https://www.who.int/publications/i/item/9789240082410)

MAGMA also notes the presence of all variants in in tier 1 and tier 2 drug resistance genes.

MAGMA will generated mixed infection reports and also optionally run tbprofiler from the fastq files for comparison purposes.

- **Non-Tuberculous Mycobacteria (NTM)**

Contains a brief report of NTM presence on the submitted samples, in cohort and per_sample structure.

- **Phylogeny**

Contains the outputs of the IQTree phylogenetic tree construction.

> :memo: By default we recommend that you use the **ExDRIncComplex** files as MAGMA was optimized to be able to accurately call positions on the edges of complex regions in the *Mtb* genome

- **SNP distances**

Contains the SNP distance tables in tsv format.

> :memo: By default we recommend that you use the **ExDRIncComplex** files as MAGMA was optimized to be able to accurately call positions on the edges of complex regions in the *Mtb* genome


- **Spotyping**

Contains a spoligotyping pattern prediction using [SpoTyping](https://github.com/xiaeryu/SpoTyping-v2.0/tree/master/SpoTyping-v2.0-commandLine).


## `vcf_files` Directory

```bash
/path/to/results_dir/vcf_files
├── cohort
│   ├── combined_variant_files
│   ├── minor_variants
│   ├── multiple_alignment_files
│   ├── raw_variant_files
│   ├── snp_variant_files
│   └── structural_variants
└── per_sample
├── minor_variants
├── raw_variant_files
└── structural_variants
```

- **Combined variant files**

> Contains the cohort gvcfs based on major variants detected by the MAGMA pipeline

- **Minor variants**

> Merged vcfs of all samples, generated by LoFreq

- **Multiple alignment files**

> FASTA files for the generation of phylogenetic trees by IQTree

- **Raw variant files**

> Unfiltered indel and SNPs detected by the MAGMA pipeline

- **SNP variant files**

> Filtered SNPs detected by the MAGMA pipeline

- **Structural variant files**

> Unfiltered structural variants detected by the MAGMA pipeline

## Libraries Directory

> Contains files related to FASTQ validation and FASTQC analysis

## Samples Directory

> Contains vcf files for major|minor|structural variants for each individual samples
1 change: 1 addition & 0 deletions docs/styles.css
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/* css styles */
Loading