Add section on --skip_bracken

BCCDC-PHL · Nov 30, 2022 · cceb828 · cceb828
1 parent 2625ec6
commit cceb828
Showing 1 changed file with 21 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -34,6 +34,8 @@ nextflow run BCCDC-PHL/taxon-abundance \
   --outdir </path/to/outdir> 
 ```
 
+### Extracting reads by taxonomic ID
+
 Reads can be binned by taxonomic group, and extracted to separate output files using the `--extract_reads` flag.
 When using this flag, a threshold is applied on the percentage of reads assigned to the taxonomic group, below which
 reads are not extracted. The default threshold is 1%. It can be modified using the `--extract_reads_threshold` flag.
@@ -49,6 +51,24 @@ nextflow run BCCDC-PHL/taxon-abundance \
   --outdir </path/to/outdir> 
 ```
 
+### Skipping Bracken
+
+By default, [bracken](https://github.com/jenniferlu717/Bracken) is used to re-estimate the read abundances for each taxonomic group,
+at a specific taxonomic level (Genus, Species, etc.).
+
+If desired, bracken can be skipped with the `--skip_bracken` flag:
+
+```
+nextflow run BCCDC-PHL/taxon-abundance \
+  --fastq_input <fastq_input_dir> \
+  --skip_bracken \
+  --outdir </path/to/outdir> 
+```
+
+When the `--skip_bracken` flag is used, abundances will be calculated directly from the kraken2 report. Note that the abundance
+estimates directly from kraken2 reports may under-estimate the actual abundances. Detailed rationale for including bracken analysis
+can be found in the [bracken paper](https://peerj.com/articles/cs-104/).
+
 ## Outputs
 
 An output directory will be created for each sample. Within those directories,
@@ -138,4 +158,4 @@ For each pipeline invocation, each sample will produce a `provenance.yml` file w
 - timestamp_analysis_start: 2021-11-25T16:53:10.549863
 ```
 
-The filename of the provenance file includes a timestamp with format `YYYYMMDDHHMMSS` to ensure that re-analysis of the same sample will create a unique `provenance.yml` file.
+The filename of the provenance file includes a timestamp with format `YYYYMMDDHHMMSS` to ensure that re-analysis of the same sample will create a unique `provenance.yml` file.