SVAN is a computational method for the annotation and classification of sequence-resolved insertions and deletions based on their sequence features into distinct classes, including Mobile Element Insertions (MEI), processed pseudogene integrations, various forms of duplications, tandem repeats expansions/contractions and nuclear-mitochondrial segments (NUMT). It primarily takes a VCF file containing the insertion or deletion SV calls as input and produces a second VCF with the annotations for each SV. It is compatible with the output of any long-read SV caller, as long as the sequence for the insertion or deletion events are included in the "alt" and "ref" field of the input VCF, respectively.
SVAN has been used for the SV characterization of 1,019 samples sequenced with long-reads from the 1000 Genomes Project:
Schloissnig et al., “Structural variation in 1,019 diverse humans based on long-read sequencing”, Nature, July 23, 2025, https://doi.org/10.1038/s41586-025-09290-7
Two different ways:
-
Go to the releases tab and download the latest release.
-
Clone the git repository in case you want the latest version of the code:
# Move into the folder in which you want to clone the repositoy.
$ cd ~/apps
# Clone it.
$ git clone https://github.com/REPBIO-LAB/SVAN.git
SVAN does not require any further installation step. It is written in Python and can be run as a standalone application on diverse Linux systems.
-
Hardware:
- 64-bits CPU
-
Software:
- 64-bit Linux System
- Python v3.5.4 or higher
- paftools v.r755 (https://github.com/lh3/minimap2/tree/master/misc)
- bwa-mem v0.7.17-r1188 (https://github.com/lh3/bwa)
- minimap2 v2.10-r764-dirty (https://github.com/lh3/minimap2)
-
Python libraries
- pysam
- cigar
- itertools
- Biopython
- subprocess
- pandas
- scipy
- numpy
SVAN takes as input 6 mandatory arguments:
- VCF: Input VCF file containing sequence-resolved insertion or deletion SV calls.
- TRF: Output for Tandem Repeat Finder (TRF) execution on the inserted or deleted sequence for each SV in the input VCF
- VNTR: Bed file containing VNTR annotation on the reference
- EXONS: Bed file containing EXON annotation on the reference
- REPEATS: Bed file containing REPEATS annotated with RepeatMasker on the reference
- CONSENSUS: Fasta file with CONSENSUS sequences for mobile elements in human
- REFERENCE: Fasta file for the reference human sequence
- SAMPLEID: Sample identified for naming the output VCF file
- Bed files for VNTR, EXONS and REPEAT annotations can be downloaded from Zenodo for hg38 (https://zenodo.org/records/15229020/files/hg38.tar.gz) and chm13 (https://zenodo.org/records/15229020/files/chm13.tar.gz).
- Fasta containing consensus sequences for retroelements can be downloaded from Zenodo.
- TRF output can be generated as described bellow.
- Produce fasta with inserted sequences:
python scripts/ins2fasta.py ins.vcf outDir
- Execute TRF
trf insertions_seq.fa 2 7 7 80 10 10 500 -h -d -ngs 1> ins_trf.out
- Produce fasta with deleted sequences
python scripts/del2fasta.py del.vcf outDir
- Execute TRF
trf deletions_seq.fa 2 7 7 80 10 10 500 -h -d -ngs 1> del_trf.out
python SVAN-INS.py ins.vcf ins_trf.out VNTR_chm13.bed EXONS_chm13.bed REPEATS_chm13.bed CONSENSUS.fa chm13.fa SAMPLEID
python SVAN-DEL.py del.vcf del_trf.out VNTR_chm13.bed EXONS_chm13.bed REPEATS_chm13.bed CONSENSUS.fa chm13.fa SAMPLEID
SVAN produces as output a standard VCF file with SV annotations incorporated into the INFO field per each variant
SVAN was initially developed by Bernardo Rodríguez Martín in the Korbel Group (https://www.embl.org/groups/korbel/) at the European Molecular Biology Laboratory (EMBL) (2022–2023). Since 2024, it has been maintained and further developed by Emiliano Sotelo Fonseca and Bernardo Rodríguez Martín in the Repetitive DNA Biology Lab (https://www.crg.eu/en/content/research/independent-fellow/bernardo-rodriguez-martin) at the Centre for Genomic Regulation (CRG).
SVAN is distributed under GPL-3.0 License.
Please open a case on the Github page for problems.