This is the official code repository for: "Variant effect prediction with reliability estimation across priority viruses" from the Marks Lab.
Viruses pose a significant threat to global health due to their rapid evolution, adaptability, and increasing potential for cross-species transmission. While advances in machine learning and the growing availability of sequence and structure data offer promise for large-scale mutation effect prediction, viruses present unique biological and informational constraints that may challenge these models. To quantify this, we introduce EVEREST—a framework for Evolutionary Variant Effect prediction with Reliability ESTimation—which assesses model performance on mutation effect prediction using a curated benchmark of 45 viral deep mutational scanning datasets (over 340 thousand variants) and develop reliability metrics to quantify model uncertainty in the absence of experimental data. This large-scale evaluation revealed wide differences in prediction accuracy across models and viral families. Contrary to findings on non-viral proteins, we find that protein language models trained on diverse sequence corpora under-perform on viral proteins compared to alignment-based models trained on a much smaller set of homologous sequences. We apply this framework across 40 WHO-prioritized pandemic-threat viruses (over 400 thousand variants across 16 viral families), and discover that state-of-the-art models fail to reliably predict mutations in over half of these viruses. Our findings uncover key factors leading to under-performance, offer actionable recommendations for improving viral mutation effect prediction, and provide an objective framework for analyzing dual-use biosecurity risk.
The viral DMS substitutions folder contains 45 curated and standardized viral deep mutational scans (DMS), listed in reference file. The viral DMS structures folder contains AlphaFold structures of all of the base sequences. The sequences and structures are used as inputs to the models below.
To model the 40 priority and prototype RNA viral pathogens from the WHO, sequence and folded structures are also collected of the antigens.
Our analysis includes models from the following papers.
Alignment-based Models:
| Model name | Input modalities | Training Database | Reference | Github |
|---|---|---|---|---|
| Site Independent | MSA | Uniref90, Uniref100 or Uniref100+BFD+MGnify | Hopf, T.A., Ingraham, J., Poelwijk, F.J., Schärfe, C.P., Springer, M., Sander, C., & Marks, D.S. (2017). Mutation effects predicted from sequence co-variation. Nature Biotechnology, 35, 128-135. | EVcouplings |
| EVmutation | MSA | Uniref90, Uniref100 or Uniref100+BFD+MGnify | Hopf, T.A., Ingraham, J., Poelwijk, F.J., Schärfe, C.P., Springer, M., Sander, C., & Marks, D.S. (2017). Mutation effects predicted from sequence co-variation. Nature Biotechnology, 35, 128-135. | EVcouplings |
| EVE | Alignment-based model | Uniref90, Uniref100 or Uniref100+BFD+MGnify | Frazer, J., Notin, P., Dias, M., Gomez, A.N., Min, J.K., Brock, K.P., Gal, Y., & Marks, D.S. (2021). Disease variant prediction with deep generative models of evolutionary data. Nature. | EVE |
Protein Language Models:
| Model name | Input modalities | Training Database | Reference | Github |
|---|---|---|---|---|
| ESM-1v (ensemble) | Single sequence | Uniref90 | Meier, J., Rao, R., Verkuil, R., Liu, J., Sercu, T., & Rives, A. (2021). Language models enable zero-shot prediction of the effects of mutations on protein function. NeurIPS. | ESM |
| Tranception (without retrieval) | Single sequence | Uniref100 | Notin, P., Dias, M., Frazer, J., Marchena-Hurtado, J., Gomez, A.N., Marks, D.S., & Gal, Y. (2022). Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. ICML. | Tranception |
| SaProt (AF2 and PDB 650M) | Single sequence & structural tokens (Foldseek) | AF2DB or AF2DB+PDB | Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, Fajie Yuan. (2024). SaProt: Protein Language Modeling with Structure-aware Vocabulary. ICLR | SaProt |
We also report a new hybrid model that combines alignment-based EVE and structural-aware PLM SaProt (SaProt-EVE) and produces reliability estimates, and compare to existing hybrid models.
Hybrid Models:
| Model name | Input modalities | Training Database | Reference | Github |
|---|---|---|---|---|
| VESPA | Single sequence | BFD+Uniref50 | Marquet, C., Heinzinger, M., Olenyi, T., Dallago, C., Bernhofer, M., Erckert, K., & Rost, B. (2021). Embeddings from protein language models predict conservation and variant effects. Human Genetics, 141, 1629 - 1647. | VESPA |
| Tranception (with MSA retrieval) | MSA | Uniref100 | Notin, P., Dias, M., Frazer, J., Marchena-Hurtado, J., Gomez, A.N., Marks, D.S., & Gal, Y. (2022). Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. ICML. | Tranception |
| TranceptEVE | MSA | Uniref100 | Notin, P., Van Niekerk, L., Kollasch, A., Ritter, D., Gal, Y. & Marks, D.S. & (2022). TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction. NeurIPS, LMRL workshop. | TranceptEVE |
| SaProt-EVE | MSA and structural tokens (Foldseek) | Uniref90, Uniref100 or Uniref100+BFD+MGnify and AF2DB+PDB | This work | This work |
The results folder contains model scores for mutation effects across all viral DMS assays for each alignment-based and protein language model as well as reported Spearman correlations between models and experiments. Confidence metrics are also reported for both alignment-based models and SaProt. New hybrid model SaProt-EVE mutation effect predictions are made for the antigens of each WHO priority virus.
The code for training these models and for mutation effect scoring is available through ProteinGym.
Special thanks to the teams of experimentalists who developed and performed the viral DMS assays this work is built on. If you are using these assays in your work, please cite the corresponding papers. To facilitate this, details of each paper is included in the DMS reference file.
This project is available under the MIT license.
Sarah Gurev*, Noor Youssef*, Navami Jain, Debora S. Marks. Variant effect prediction with reliability estimation across priority viruses. BioRxiv, 2025.
(* equal contribution)
