Summary
Hello everyone and thanks for all the work you guys did on this benchmark suite!
Results from https://doi.org/10.1063/5.0303302 suggest that small clusters would be an important test for aspiring universal MLIPs. This benchmark would consist of the evaluation of model accuracy on isolated clusters of atoms (dimers, trimers, tetramers and so on).
Interactive features
No interactive features besides modifying the weight of clusters of different sizes.
Category
This benchmark would most likely define its own category ("clusters"), but including it into "physicality" is also a possibility.
Data availability
This is a short description of the data we have been able to generate so far for this benchmark.
Cluster generation
We generated clusters containing between 3 and 8 atoms by randomly placing the atoms into a structure whilst ensuring that all interatomic distances were kept between 2.0 and 2.5 Å. We avoided monomers and dimers because we understand that these are tested separately in a different "dimer curve" ML-PEG benchmark, but it would be straightforward and inexpensive to add those too. We randomly chose the identities of the elements to include the first three rows of the periodic table. Indeed, we believe that light metals and main-group elements should nearly always be within the application domain of general-purpose models, including those that primarily focus on organic chemistry. This has the further advantage that all-electron DFT calculations for these clusters are inexpensive, making dataset generation efforts easily reproducible, improvable and extensible by the community. We included 10000 clusters for each size, from 3 to 8 atoms, inclusive.
DFT calculations
We used the ORCA and FHI-AIMS packages to perform DFT calculations that are fully consistent with the OMol25 (ωB97M-V/def2-TZVPD) and MAD-1.5 (r2SCAN) datasets, respectively. OMol-style calculations were performed with a total charge of zero and and spin multiplicities of 1 (in the case of an even total number of electrons in the cluster) or 2 (in the case of an odd number of electrons), as these are the most common and well-represented use cases in the chemistry literature. The convergence rate is very high for both levels of theory; we set energies and forces to NaN in the rare cases of non-convergence of the DFT calculations.
Computational cost
The computational cost entirely depends on the availability and efficiency of batching, as the clusters are very small. In the best-case scenario (optimal batching), the total compute required by this benchmark would be equivalent to the evaluation of around 50 structures of a few thousand atoms each, hence relatively short.
Serial evaluation would be dominated by the overhead of running the model sequentially through approximately 60k clusters.
Additional context
No response
Summary
Hello everyone and thanks for all the work you guys did on this benchmark suite!
Results from https://doi.org/10.1063/5.0303302 suggest that small clusters would be an important test for aspiring universal MLIPs. This benchmark would consist of the evaluation of model accuracy on isolated clusters of atoms (dimers, trimers, tetramers and so on).
Interactive features
No interactive features besides modifying the weight of clusters of different sizes.
Category
This benchmark would most likely define its own category ("clusters"), but including it into "physicality" is also a possibility.
Data availability
This is a short description of the data we have been able to generate so far for this benchmark.
Cluster generation
We generated clusters containing between 3 and 8 atoms by randomly placing the atoms into a structure whilst ensuring that all interatomic distances were kept between 2.0 and 2.5 Å. We avoided monomers and dimers because we understand that these are tested separately in a different "dimer curve" ML-PEG benchmark, but it would be straightforward and inexpensive to add those too. We randomly chose the identities of the elements to include the first three rows of the periodic table. Indeed, we believe that light metals and main-group elements should nearly always be within the application domain of general-purpose models, including those that primarily focus on organic chemistry. This has the further advantage that all-electron DFT calculations for these clusters are inexpensive, making dataset generation efforts easily reproducible, improvable and extensible by the community. We included 10000 clusters for each size, from 3 to 8 atoms, inclusive.
DFT calculations
We used the ORCA and FHI-AIMS packages to perform DFT calculations that are fully consistent with the OMol25 (ωB97M-V/def2-TZVPD) and MAD-1.5 (r2SCAN) datasets, respectively. OMol-style calculations were performed with a total charge of zero and and spin multiplicities of 1 (in the case of an even total number of electrons in the cluster) or 2 (in the case of an odd number of electrons), as these are the most common and well-represented use cases in the chemistry literature. The convergence rate is very high for both levels of theory; we set energies and forces to NaN in the rare cases of non-convergence of the DFT calculations.
Computational cost
The computational cost entirely depends on the availability and efficiency of batching, as the clusters are very small. In the best-case scenario (optimal batching), the total compute required by this benchmark would be equivalent to the evaluation of around 50 structures of a few thousand atoms each, hence relatively short.
Serial evaluation would be dominated by the overhead of running the model sequentially through approximately 60k clusters.
Additional context
No response