Skip to content
Ilia Popov edited this page May 30, 2025 · 2 revisions

KrakenParser Usage Guide

INSTALLATION

conda create -n krakenparser pip -y
conda activate krakenparser
pip install krakenparser

HELP

! KrakenParser -h
KrakenParser by Ilia V. Popov
usage: KrakenParser [--complete] [--kreport2mpa] [--combine_mpa]
                    [--deconstruct] [--deconstruct_viruses] [--process]
                    [--txt2csv] [--relabund] [-V]

KrakenParser: Convert Kraken2 Reports to CSV.

options:
  --complete            Run the full pipeline automated
  --kreport2mpa         Convert Kraken2 Reports to MPA Format
  --combine_mpa         Combine MPA Files
  --deconstruct         Extract Taxonomic Levels from combined MPA file
  --deconstruct_viruses
                        Extract Taxonomic Levels from combined MPA file using
                        only VIRUSES domain
  --process             Process Extracted Taxonomic Data
  --txt2csv             Convert TXT to CSV
  --relabund            Calculate relative abundance
  -V, --version         show program's version number and exit

COMPLETE PIPELINE USAGE EXAMPLE

First, download the demo data

Input

! wget https://github.com/PopovIILab/KrakenParser/raw/refs/heads/dev/demo_data.zip && \
    unzip demo_data.zip && rm -rf demo_data.zip

Then run KrakenParser

Input

! KrakenParser --complete -i demo_data/kreports/

By the key -i the path to input file(s) (kraken2 reports file) must be provided

The resulted output files:

data/
β”œβ”€ kreports/           # Input Kraken2 reports
β”œβ”€ mpa/                # Converted MPA files
β”œβ”€ COMBINED.txt        # Merged MPA file
└─ counts/
   β”œβ”€ txt/             # Extracted taxonomic levels in TXT
   β”‚  β”œβ”€ counts_species.txt
   β”‚  β”œβ”€ counts_genus.txt
   β”‚  β”œβ”€ counts_family.txt
   β”‚  β”œβ”€ counts_order.txt
   β”‚  β”œβ”€ counts_class.txt
   β”‚  └─ counts_phylum.txt
   └─ csv/             # Total abundance CSV output
   β”‚  β”œβ”€ counts_species.csv
   β”‚  β”œβ”€ counts_genus.csv
   β”‚  β”œβ”€ counts_family.csv
   β”‚  β”œβ”€ counts_order.csv
   β”‚  β”œβ”€ counts_class.csv
   β”‚  β”œβ”€ counts_phylum.csv
   └─ csv_relabund/    # Relative abundance CSV output
      β”œβ”€ counts_species.csv
      β”œβ”€ counts_genus.csv
      β”œβ”€ counts_family.csv
      β”œβ”€ counts_order.csv
      β”œβ”€ counts_class.csv
      └─ counts_phylum.csv

Then group low abundant (<4.0%) taxa on species level

! KrakenParser --relabund -i demo_data/counts/csv/counts_species.csv -o demo_data/counts/csv_relabund/counts_species_2.csv -O 4

The resulted demo_data/counts/csv_relabund/counts_species_2.csv file:

Sample_id,taxon,rel_abund_perc
X1,Other (<4.0%),42.686249843331495
X1,Escherichia coli,12.75490546506176
X1,Haemophilus ducreyi,10.075525825933164
X1,Salmonella enterica,9.632211973651838
X1,Staphylococcus aureus,8.720517307808358
X1,Klebsiella pneumoniae,7.072132502100518
X1,Bacteroides fragilis,4.549653472470442
X1,Morganella morganii,4.508803609642424
X2,Other (<4.0%),46.04232622442552
X2,Morganella morganii,16.5117268346954
X2,Escherichia coli,15.349350872456869
X2,Salmonella enterica,10.345628717042564
X2,Klebsiella pneumoniae,7.063587503374482
X2,Haemophilus ducreyi,4.68737984800517
...
X8,Other (<4.0%),32.34802433843469
X8,Bartonella krasnovii,25.39596964453076
X8,Pediococcus pentosaceus,9.254716232772273
X8,Latilactobacillus sakei,8.688709539996534
X8,Staphylococcus aureus,7.8557850790470285
X8,Bacteroides fragilis,5.997643719867084
X8,Escherichia coli,5.454406567029118
X8,Haemophilus ducreyi,5.004744878322517
X9,Other (<4.0%),46.9787738378817
X9,Escherichia coli,18.22909641989718
X9,Salmonella enterica,10.944105961952845
X9,Klebsiella pneumoniae,9.848017408277853
X9,Staphylococcus aureus,7.364958937831675
X9,Haemophilus ducreyi,6.635047434158754

This file will be used as the input in all the visualization APIs documentation later on

Clone this wiki locally