Skip to content
minaminii edited this page Jul 1, 2026 · 5 revisions

Welcome to the ViroWatch wiki!

ViroWatch is a Nextflow pipeline for HIV-1 genome surveillance from Oxford Nanopore Technology (ONT) long reads. It takes per-sample FASTQ files through quality control, optional Kraken2 read-set taxonomy QC, de novo assembly, consensus polishing, drug resistance analysis, and optional BLAST-based subtyping — producing a per-sample HTML surveillance report and Neo4j-compatible knowledge graph CSVs.

Pipeline at a glance

flowchart TD
    FASTQ([FASTQ]) --> DEDUP["seqkit rmdup\ndeduplication"]
    DEDUP --> CHOP["chopper\nquality & length filter"]

    CHOP --> NS["NanoStat\nread QC stats"]
    NS --> K2["Kraken2\nread-set taxonomy QC"]:::opt
    CHOP --> MM[minimap2]
    MM --> QM["qualimap\nmapping QC"]
    CHOP --> FLYE["Flye --meta\nde novo assembly"]

    FLYE --> RACON["Racon ×3\npolishing"]
    RACON --> MEDAKA["Medaka\npolished consensus"]

    MEDAKA --> QUAST["QUAST\nassembly QC vs reference"]
    MEDAKA --> SP["SierraPy\nStanford HIVDB"]
    MEDAKA --> BLA["BLAST vs LosAlamos\nHIV-1 subtyping"]:::opt
    MEDAKA --> BNT["BLAST vs core_nt\nNCBI taxonomy"]:::opt

    NS & QM & QUAST --> MQC[MultiQC]
    MQC & SP & BLA & BNT --> RPT[/HTML report/]
    MEDAKA & SP & BLA & BNT & K2 --> KG["kg_export.py + meta_kg_export.py\nNeo4j CSVs"]

    classDef opt fill:#f5f5f5,stroke:#aaa,stroke-dasharray:5 5
Loading

Key outputs per sample

Results land in <outdir>/<sample_id>/:

Path Contents
nanostat/ Read QC statistics
kraken2/ Kraken2 read-set taxonomy QC report + output (if --kraken2_db enabled)
aln.bam Reference-mapped reads
qualimap/ Mapping QC
flye/ De novo assembly + assembly_info.txt
medaka_consensus/consensus.fasta Final polished consensus
quast/ Assembly QC vs reference
sierrapy.json Stanford HIVDB drug resistance result
blast/los_alamos.blast.json LosAlamos subtype BLAST result (if enabled)
blast/core_nt.blast.json NCBI core_nt BLAST result (if enabled)
multiqc/ Aggregated MultiQC report
<sample_id>_report.html Per-sample surveillance report
kg/ Neo4j-compatible CSVs for knowledge graph import

Bundled references

File Description
assets/refs/CRF01_AE.fa / .gff CRF01_AE reference (default; Southeast Asia)
assets/refs/HXB2.fa / .gff HXB2 reference (generic HIV-1 clade B)
assets/test_data/sample_test.fq.gz ONT reads for smoke-testing
assets/blast/LosAlamos_db.tar.gz Pre-built LosAlamos BLAST DB (66 MB)
assets/blast/LosAlamos_db.gz Raw FASTA for rebuilding the DB (20 MB)

Clone this wiki locally