This repository is currently in pre-release mode. Official release of v0.1.0 will be communicated soon.
MLIPAudit is a Python library and app for benchmarking and validating Machine Learning Interatomic Potential (MLIP) models, in particular those based on the mlip library. It aims to cover a wide range of use cases and difficulties, providing users with a comprehensive overview of the performance of their models. It also provides the option to benchmark models of any origin (e.g., also those based on PyTorch) via the ASE calculator interface.
MLIPAudit can be installed via pip:
pip install mlipauditHowever, this command only installs the regular CPU version of JAX. If benchmarking
native JAX models, we recommend installing the core library along with the GPU
dependencies (jax[cuda12] and jaxlib) with the following command:
pip install "mlipaudit[cuda]"The detailed code documentation that also contains descriptions for each benchmark and tutorials on how to use MLIPAudit as an applied user, can be found here.
MLIPAudit can be used via its CLI tool mlipaudit, which can carry out two main tasks:
the benchmarking task and a graphical UI app for visualization of results. Furthermore,
for advanced users that want to add their own benchmarks or create their own app with
our existing benchmark classes, we also offer to use MLIPAudit as a library.
After installation via pip, the mlipaudit command is available in your terminal.
Run the following to obtain an overview of two main tasks, benchmark and gui:
mlipaudit -hThe -h flag prints the help message with the info on how to use the tool.
See below, for details on the two available tasks.
The first task is benchmark. It executes a benchmark run and can be configured
via some command line arguments. To print the help message for this specific task,
run:
mlipaudit benchmark -hFor example, to launch a full benchmark for a model located at /path/to/model.zip,
you can run:
mlipaudit benchmark -m /path/to/model.zip -o /path/to/outputIn this case, benchmark results are written to the directory /path/to/output. In this
output directory, there will be subdirectories for the benchmarked models, and for the
benchmarks. Each benchmark will contain a result.json file with the results.
The results can contain multiple metrics, however, they will also always include a
single score that rates a model's performance on a benchmark on a scale of 0 to 1.
For a tutorial on how to run models that are not native to the mlip library, see this section of our documentation.
To visualize the detailed results (potentially of multiple models), the gui task can
be run. To get more information, run:
mlipaudit gui -hFor example, to display the results stored at /path/to/output, execute:
mlipaudit gui /path/to/outputThis should automatically open a webpage in your browser with a graphical user interface that lets you explore the benchmark results visually. This interface was created using streamlit.
Note: The zip archives for the models must follow the convention that the model name
(one of mace, visnet, nequip as of mlip v0.1.3) must be part of the zip file
name, such that our app knows which model architecture to load the model into. For
example, the aforementioned model.zip file name would not work, but instead
model_mace.zip or visnet_model.zip would be possible.
Benchmarks can also be run on external models, provided either via the ASE calculator
interface or the ForceField API for the mlip
library. For more details, see our documentation
here.
As described in more detail in the code documentation, the benchmark classes can also be easily imported into your own Python code base. Especially, check out the API reference of our documentation for details on the available functions.
You can use these functions to build your own benchmarking script and GUI pages for our
app. For inspiration, we recommend to take a look at the main script located
at src/mlipaudit/main.py and the implementation of the GUI located at
src/mlipaudit/app.py.
The data for the benchmarks is located on HuggingFace
in this space. The
benchmark classes will automatically download the data into a local ./data directory
when needed but won't re-download it if it already exists.
A public leaderboard of models can be found here. It is based on the same graphical interface as the UI app provided with this library.
To work directly in this repository, run
uv sync --extra cudato set up the environment, as this repo uses uv for package and dependency management.
This command installs the main and dev dependency groups. We recommend to check out
the pyproject.toml file for more information. Furthermore,
the extra cuda installs the GPU-ready version of JAX which is strongly recommended.
If you do not want to install the cuda extra (for example, because you are
on MacOS that does not support this standard installation), you can omit the
--extra cuda option in the uv command.
When adding new benchmarks, make sure that the following key pieces are added for each one:
- The benchmark implementation (with unit tests)
- The benchmark UI page (add to existing generic unit test for UI pages)
- The benchmark documentation
More information on adding new benchmarks can be found here in our documentation.
To build a version of the code documentation locally to view your changes, you can run:
uv run sphinx-build -b html docs/source docs/build/html
The documentation will be built in the docs/build/html directory.
You can then open the index.html file in your browser to view the documentation.
We would like to acknowledge beta testers for this library: Marco Carobene, Massimo Bortone, Jack Sawdon, Olivier Peltre and Alex Laterre.
We kindly request that you to cite our white paper when using this library:
L. Wehrhan, L. Walewski, M. Bluntzer, H. Chomet, C. Brunken, J.Tilly and S. Acosta-Gutiérrez, MLIPAudit: A benchmarking tool for Machine Learned Interatomic Potentials, soon on arXiv.