Paula Ruiz-Rodriguez1
and Mireia Coscolla1
1. Institute for Integrative Systems Biology, I2SysBio, University of Valencia-CSIC, Valencia, Spain
get_MNV is a tool designed to identify Multi-Nucleotide Variants (MNVs) within the same codon in genomic sequences. MNVs occur when multiple Single Nucleotide Variants (SNVs) are present within the same codon, leading to the translation of a different amino acid. This tool addresses limitations in current annotation programs like ANNOVAR or SnpEff, which are primarily designed to work with individual SNVs and might overlook the actual amino acid changes resulting from MNVs.
get_MNV seeks to address this issue, enhancing the comprehensiveness of genetic variant interpretation.
IMPORTANT this script works with SNV against a reference, insertions and deletions modifiying reading frame are not currently supported
- MNV Identification: Detects SNVs occurring within the same codon and reclassifies them as MNVs.
- Accurate Amino Acid Change Calculation: Computes the resulting amino acid changes based on genomic reads.
- Integration with BAM and VCF Files: Supports input from VCF files for variants and optional BAM files for aligned reads.
- Quality Analysis: Allows setting a minimum Phred quality threshold to filter out low-quality reads.
You can install get_MNV via conda, mamba (for unix/mac) or downloading the binary file (unix):
conda install -c bioconda get_mnv
mamba install -c bioconda get_mnv
wget https://github.com/PathoGenOmics-Lab/get_MNV/releases/download/1.0.0/get_mnv
get_mnv [OPTIONS] --vcf <VCF_FILE> --fasta <FASTA_FILE> --genes <GENES_FILE>
- -v, --vcf <VCF_FILE>: VCF file containing the SNVs. (Required)
- -b, --bam <BAM_FILE>: BAM file with aligned reads. (Optional)
- -f, --fasta <FASTA_FILE>: FASTA file with the reference sequence. (Required)
- -g, --genes <GENES_FILE>: File containing gene information. (Required)
- -q, --quality : Minimum Phred quality score (default: 20).
get_mnv \
--vcf variants.vcf \
--bam reads.bam \
--fasta reference.fasta \
--genes genes.txt \
--quality 30
- VCF File: Should contain the identified SNVs.
- BAM File: (Optional) Genomic reads aligned to the reference sequence.
- FASTA File: Reference genomic sequence.
- Gene File: A tab-delimited text file with the following structure per line (GeneName,GeneStart,GeneEnd,Strand):
Rv0007_Rv0007 9914 10828 +
ileT_Rvnt01 10887 10960 +
alaT_Rvnt02 11112 11184 +
Rv0008c_Rv0008c 11874 12311 -
ppiA_Rv0009 12468 13016 +
Rv0010c_Rv0010c 13133 13558 -
The program generates a TSV file named <vcf_filename>.MNV.tsv containing the following information:
- Gene: Name of the gene.
- Positions: Positions of the variants.
- Base Changes: Nucleotide base changes.
- AA Changes: Resulting amino acid changes.
- SNP AA Changes: Amino acid changes if considering individual SNVs.
- Variant Type: Type of variant (SNP, MNV, or SNP/MNV).
- Change Type: Type of change at the protein level (Synonymous, Non-synonymous, Stop gained).
- SNP Reads: (If BAM provided) Count of reads supporting each SNP.
- MNV Reads: (If BAM provided) Count of reads supporting the MNV.
Example:
Gene Positions Base Changes AA Changes SNP AA Changes Variant Type Change Type SNP Reads MNV Reads
Rv0095c_Rv0095c 104838 T Asp126Glu Asp126Glu SNP Non-synonymous 0 16
Rv0095c_Rv0095c 104941,104942 T,G Gly92Gln Gly92Glu; Gly92Arg MNV Non-synonymous 0,0 25
esxL_Rv1198 1341044 C His13His His13His SNP Synonymous 0 41
esxL_Rv1198 1341083 G Ala26Ala Ala26Ala SNP Synonymous 0 12
esxL_Rv1198 1341102,1341103 T,C Arg33Ser Arg33Cys; Arg33Pro MNV Non-synonymous 0,0 11
- The script currently works only with SNVs compared against a reference sequence.
- Insertions and deletions that modify the reading frame are not supported in this version.
Paula Ruiz-Rodriguez 💻 🔬 🤔 🔣 🎨 🔧 |
Mireia Coscolla 🔍 🤔 🧑🏫 🔬 📓 |
This project follows the all-contributors specification (emoji key).
Click for the stl file