An AI-based Metascore for Robust and Scalable Prioritization of Non-Coding Variants in Whole-Genome Sequencing Data
MobiDeep is a metascore for non-coding variants (SNVs only) based on a multilayer perceptron using 5 features: ReMM v.0.4, CADD 1.7, GPN-MSA, and two conservation scores (Cactus 241-way vertebrates and PhyloP primates) to capture different evolutionary depths.
🌐 Try MobiDeep Online: MobiDeep is now integrated into the MobiDetails web application for easy variant analysis without any setup required.
- Raw score: 0 = benign, 1 = maximum pathogenicity
- Log score: logarithmic transformation using the formula:
log_score = -log10(1 - raw_score)
| Classification | Raw score | Log score |
|---|---|---|
| Neutral | < 0.6 | < 0.3979 |
| Likely deleterious | > 0.6 | > 0.3979 |
| High confidence deleteriousness | > 0.9684 | > 1.5 |
Usage recommendation: Use the Likely deleterious threshold (0.6) for general pathogenicity prediction and the high confidence threshold (0.9684) if you require a high positive predictive value. Region-specific thresholds are available in the MobiDetails web interface.
Visit MobiDetails to:
- Score individual variants instantly
- Access region-specific thresholds with radar view visualization
- View comprehensive variant annotations including MobiDeep scores
Download the model file, mobideep_20250520.joblib, from the MobiDetails website.
Process VCF files locally using our Apptainer container for:
- Batch analysis of multiple variants
- Integration into bioinformatics pipelines
- Offline processing
Download pre-computed dataset of MobiDeep scores for 8,.773 billion single nucleotide variants covering 94.7% of all genomic positions across the GRCh38p14 reference genome. through our download portal: https://mobidetails.chu-montpellier.fr/about
- Apptainer (or Singularity version 3.5+) installed. See the Apptainer Installation Guide.
- A Linux/macOS environment
- Annotation data files (see Data Setup section below)
Download the ready-to-use container file, mobideep.sif, from the MobiDetails website.
Otherwise you can rebuild it using the mobideep.def apptainer definition file.
apptainer build mobideep.sif mobideep.defMobiDeep requires several large annotation data files. You can download them from the appropriate websites.
| Database | Expected Filename | Version |
|---|---|---|
| CADD (SNVs) | whole_genome_SNVs.tsv.gz | v1.7 |
| CADD (Indels) | gnomad.genomes.r4.0.indel.tsv.gz | v1.7 |
| GPN-MSA | scores.tsv.bgz | - |
| ReMM | ReMM_v0.4.hg38.tsv.gz | v0.4 |
| PhyloP (241-way) | cactus241way.phyloP.bw | - |
| PhyloP (Primates) | phyloPPrimates.bigWig | - |
Directory Structure:
/path/to/your/annotation_data/
├── whole_genome_SNVs.tsv.gz
├── whole_genome_SNVs.tsv.gz.tbi
├── gnomad.genomes.r4.0.indel.tsv.gz
├── gnomad.genomes.r4.0.indel.tsv.gz.tbi
├── scores.tsv.bgz
├── scores.tsv.bgz.tbi
├── ReMM_v0.4.hg38.noheader.tsv.gz
├── ReMM_v0.4.hg38.noheader.tsv.gz.tbi
├── cactus241way.phyloP.bw
└── phyloPPrimates.bigWigImportant: Ensure that all gzipped files (.gz, .bgz) are indexed with Tabix. The index files (.tbi) should be in the same directory.
| Argument | Description | Required |
|---|---|---|
--vcf |
Path to the input VCF file to be scored (hg38/GRCh38) | Yes |
--data_dir |
Path to the directory containing all annotation data files | Yes |
--output_file |
Path where the output TSV file will be saved | Yes |
--threshold |
The probability score threshold to classify a variant as "Pathogenic" | No (Default: 0.6) |
Test the container
# Define paths for clarity
ANNOTATION_DIR="/data/annotations" # replace with your own path
apptainer run \
--bind ${ANNOTATION_DIR}:/annotations \
mobideep.sif \
--vcf test_variants.vcf \
--data_dir /annotations \
--output_file /variants/results.tsvBasic usage with default threshold (0.6):
# Define paths for clarity
ANNOTATION_DIR="/data/annotations" # replace with your own path
VARIANT_DIR="/data/variants" # replace with your own path
# Run the container
apptainer run \
--bind ${ANNOTATION_DIR}:/annotations \
--bind ${VARIANT_DIR}:/variants \
mobideep.sif \
--vcf /variants/input.vcf \
--data_dir /annotations \
--output_file /variants/results.tsvHigh confidence analysis (threshold 0.9684):
apptainer run \
--bind ${ANNOTATION_DIR}:/annotations \
--bind ${VARIANT_DIR}:/variants \
mobideep.sif \
--vcf /variants/input.vcf \
--data_dir /annotations \
--output_file /variants/results_high_confidence.tsv \
--threshold 0.9684The --bind flag makes directories from your computer (the "host") visible inside the container:
--bind ${ANNOTATION_DIR}:/annotations: Mounts your annotation data to the/annotationspath inside the container--bind ${VARIANT_DIR}:/variants: Mounts your VCF directory to/variants(also where output will be written)
The output is a tab-separated file (.tsv) containing the original variant information along with annotation scores and MobiDeep predictions:
#CHROM POS ID REF ALT CADD_PHRED ... MobiDeep_Score MobiDeep_Class
1 55040253 rs12345 C T 14.8900 ... 0.9543 Pathogenic
10 114221763 . A G 5.4321 ... 0.0210 Neutral
- MobiDeep_Score: The raw probability score from the MLP model (0 to 1)
- MobiDeep_Class: "Pathogenic" or "Neutral", based on whether the MobiDeep_Score is above or below the specified
--threshold
- Standard VCF Input: Directly processes standard VCF files (hg38/GRCh38)
- Comprehensive Annotation: Enriches variants with scores from top-performing predictors:
- CADD (v1.7)
- GPN-MSA
- ReMM (v0.4)
- phyloP (cactus241way)
- phyloP (phyloP Primates)
- MobiDeep pathogenicity score
- Flexible Thresholds: Multiple classification thresholds for different confidence levels
- Portable & Reproducible: Distributed as a single Apptainer (.sif) file
- Web Integration: Also available through the MobiDetails web interface
If you use MobiDeep in your research, please cite: (BOUAZZAOUI ET AL. - Citation details to be updated)
This project is licensed under the GPL v3 License. See the LICENSE file for details.
For questions or issues:
- Web application support: Visit MobiDetails
- Command-line tool issues: Open an issue in this repository
