Skip to content

Commit

Permalink
Add section on --skip_bracken
Browse files Browse the repository at this point in the history
  • Loading branch information
dfornika committed Nov 30, 2022
1 parent 2625ec6 commit cceb828
Showing 1 changed file with 21 additions and 1 deletion.
22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ nextflow run BCCDC-PHL/taxon-abundance \
--outdir </path/to/outdir>
```

### Extracting reads by taxonomic ID

Reads can be binned by taxonomic group, and extracted to separate output files using the `--extract_reads` flag.
When using this flag, a threshold is applied on the percentage of reads assigned to the taxonomic group, below which
reads are not extracted. The default threshold is 1%. It can be modified using the `--extract_reads_threshold` flag.
Expand All @@ -49,6 +51,24 @@ nextflow run BCCDC-PHL/taxon-abundance \
--outdir </path/to/outdir>
```

### Skipping Bracken

By default, [bracken](https://github.com/jenniferlu717/Bracken) is used to re-estimate the read abundances for each taxonomic group,
at a specific taxonomic level (Genus, Species, etc.).

If desired, bracken can be skipped with the `--skip_bracken` flag:

```
nextflow run BCCDC-PHL/taxon-abundance \
--fastq_input <fastq_input_dir> \
--skip_bracken \
--outdir </path/to/outdir>
```

When the `--skip_bracken` flag is used, abundances will be calculated directly from the kraken2 report. Note that the abundance
estimates directly from kraken2 reports may under-estimate the actual abundances. Detailed rationale for including bracken analysis
can be found in the [bracken paper](https://peerj.com/articles/cs-104/).

## Outputs

An output directory will be created for each sample. Within those directories,
Expand Down Expand Up @@ -138,4 +158,4 @@ For each pipeline invocation, each sample will produce a `provenance.yml` file w
- timestamp_analysis_start: 2021-11-25T16:53:10.549863
```
The filename of the provenance file includes a timestamp with format `YYYYMMDDHHMMSS` to ensure that re-analysis of the same sample will create a unique `provenance.yml` file.
The filename of the provenance file includes a timestamp with format `YYYYMMDDHHMMSS` to ensure that re-analysis of the same sample will create a unique `provenance.yml` file.

0 comments on commit cceb828

Please sign in to comment.