Can I use datasets that contain multiple different molecules (i.e. different number of atoms)? #200
-
I would like to use NeqIP to predict the energy (and maybe forces) in monolayers of various molecules. The molecules are arranged in several different unit cells. As a result the number of atoms are different for each point in my data set. After studying the examples in the repo and the paper I am under the impression that this is not possible. Yet in this question it seems a user has applied NeqIP to a data set with different numbers of iron atoms per unit cell. So does it work after all and did I just misinterpret the examples/documentaion? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @jcartus , This is definitely possible, just not with the
You can also use any other format that ASE can read and change Please see |
Beta Was this translation helpful? Give feedback.
-
(Just a note on what you are trying to do: even if you are only interested in energy predictions, if you have force data in your training data you should definitely train with both energies and forces. The training will be much faster, and likely more accurate. If you only need energy predictions without the overhead of force computation, you can then later disable the force computation during inference if performance is important for you.) |
Beta Was this translation helpful? Give feedback.
Hi @jcartus ,
This is definitely possible, just not with the
NpzDataset
, which we use inexample.yaml
andminimal.yaml
because the data is already in that format. We recommend converting your dataset into theextxyz
format and using the ASE dataset feature:You can also use any other format that ASE can read and change
format
accordingly.Please see
full.yaml
for more details: https://github.com/mir-group/nequip/blob/main/configs/full.yaml#L69-L95