Model evaluation on small clusters

### Summary

Hello everyone and thanks for all the work you guys did on this benchmark suite!

Results from https://doi.org/10.1063/5.0303302 suggest that small clusters would be an important test for aspiring universal MLIPs. This benchmark would consist of the evaluation of model accuracy on isolated clusters of atoms (dimers, trimers, tetramers and so on).

### Interactive features

No interactive features besides modifying the weight of clusters of different sizes.

### Category

This benchmark would most likely define its own category ("clusters"), but including it into "physicality" is also a possibility.

### Data availability

This is a short description of the data we have been able to generate so far for this benchmark.

Cluster generation
We generated clusters containing between 3 and 8 atoms by randomly placing the atoms into a structure whilst ensuring that all interatomic distances were kept between 2.0 and 2.5 Å. We avoided monomers and dimers because we understand that these are tested separately in a different "dimer curve" ML-PEG benchmark, but it would be straightforward and inexpensive to add those too. We randomly chose the identities of the elements to include the first three rows of the periodic table. Indeed, we believe that light metals and main-group elements should nearly always be within the application domain of general-purpose models, including those that primarily focus on organic chemistry. This has the further advantage that all-electron DFT calculations for these clusters are inexpensive, making dataset generation efforts easily reproducible, improvable and extensible by the community. We included 10000 clusters for each size, from 3 to 8 atoms, inclusive.

DFT calculations
We used the ORCA and FHI-AIMS packages to perform DFT calculations that are fully consistent with the OMol25 (ωB97M-V/def2-TZVPD) and MAD-1.5 (r2SCAN) datasets, respectively. OMol-style calculations were performed with a total charge of zero and and spin multiplicities of 1 (in the case of an even total number of electrons in the cluster) or 2 (in the case of an odd number of electrons), as these are the most common and well-represented use cases in the chemistry literature. The convergence rate is very high for both levels of theory; we set energies and forces to NaN in the rare cases of non-convergence of the DFT calculations.

### Computational cost

The computational cost entirely depends on the availability and efficiency of batching, as the clusters are very small. In the best-case scenario (optimal batching), the total compute required by this benchmark would be equivalent to the evaluation of around 50 structures of a few thousand atoms each, hence relatively short.

Serial evaluation would be dominated by the overhead of running the model sequentially through approximately 60k clusters.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model evaluation on small clusters #546

Summary

Interactive features

Category

Data availability

Computational cost

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model evaluation on small clusters #546

Description

Summary

Interactive features

Category

Data availability

Computational cost

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions