-
Notifications
You must be signed in to change notification settings - Fork 0
Home
minaminii edited this page Jul 1, 2026
·
5 revisions
ViroWatch is a Nextflow pipeline for HIV-1 genome surveillance from Oxford Nanopore Technology (ONT) long reads. It takes per-sample FASTQ files through quality control, optional Kraken2 read-set taxonomy QC, de novo assembly, consensus polishing, drug resistance analysis, and optional BLAST-based subtyping — producing a per-sample HTML surveillance report and Neo4j-compatible knowledge graph CSVs.
flowchart TD
FASTQ([FASTQ]) --> DEDUP["seqkit rmdup\ndeduplication"]
DEDUP --> CHOP["chopper\nquality & length filter"]
CHOP --> NS["NanoStat\nread QC stats"]
NS --> K2["Kraken2\nread-set taxonomy QC"]:::opt
CHOP --> MM[minimap2]
MM --> QM["qualimap\nmapping QC"]
CHOP --> FLYE["Flye --meta\nde novo assembly"]
FLYE --> RACON["Racon ×3\npolishing"]
RACON --> MEDAKA["Medaka\npolished consensus"]
MEDAKA --> QUAST["QUAST\nassembly QC vs reference"]
MEDAKA --> SP["SierraPy\nStanford HIVDB"]
MEDAKA --> BLA["BLAST vs LosAlamos\nHIV-1 subtyping"]:::opt
MEDAKA --> BNT["BLAST vs core_nt\nNCBI taxonomy"]:::opt
NS & QM & QUAST --> MQC[MultiQC]
MQC & SP & BLA & BNT --> RPT[/HTML report/]
MEDAKA & SP & BLA & BNT & K2 --> KG["kg_export.py + meta_kg_export.py\nNeo4j CSVs"]
classDef opt fill:#f5f5f5,stroke:#aaa,stroke-dasharray:5 5
Results land in <outdir>/<sample_id>/:
| Path | Contents |
|---|---|
nanostat/ |
Read QC statistics |
kraken2/ |
Kraken2 read-set taxonomy QC report + output (if --kraken2_db enabled) |
aln.bam |
Reference-mapped reads |
qualimap/ |
Mapping QC |
flye/ |
De novo assembly + assembly_info.txt
|
medaka_consensus/consensus.fasta |
Final polished consensus |
quast/ |
Assembly QC vs reference |
sierrapy.json |
Stanford HIVDB drug resistance result |
blast/los_alamos.blast.json |
LosAlamos subtype BLAST result (if enabled) |
blast/core_nt.blast.json |
NCBI core_nt BLAST result (if enabled) |
multiqc/ |
Aggregated MultiQC report |
<sample_id>_report.html |
Per-sample surveillance report |
kg/ |
Neo4j-compatible CSVs for knowledge graph import |
| File | Description |
|---|---|
assets/refs/CRF01_AE.fa / .gff
|
CRF01_AE reference (default; Southeast Asia) |
assets/refs/HXB2.fa / .gff
|
HXB2 reference (generic HIV-1 clade B) |
assets/test_data/sample_test.fq.gz |
ONT reads for smoke-testing |
assets/blast/LosAlamos_db.tar.gz |
Pre-built LosAlamos BLAST DB (66 MB) |
assets/blast/LosAlamos_db.gz |
Raw FASTA for rebuilding the DB (20 MB) |