-
Training a model with xyz datasetHi,
Since cells of different dimensions are present (mainly 1, 54 and 128 atoms cells), energies are very different. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
Hi @peppe69, Thanks for reaching out— one thing that leaps out immediately is that you aren't using rescaling:
Unless your data is already normalized (and even in that case), this is not going to work out well— we strongly recommend the default setup:
with default settings (please make sure that you are using the latest stable NequIP). The default settings will give you a model whose predicted energy is size-extensive, which will be very important for your variable-sized data. Second (while not necessary), we have found that training jointly on forces and energies is very helpful even if you don't need the forces from your model. Since you seem to have force data in your training set, you might consider this. I'm also cc'ing my colleague @simonbatzner, who is most familiar with hyperparameters for these models. |
Beta Was this translation helpful? Give feedback.
-
Hi @peppe69, following up on Alby's reply. A few things:
You seem to have changed a number of the other hyperparameters as well in ways that I think are suboptimal. I will list them here:
Note that the
to get up to
If you want to have a more accurate
I've attached a suggested energy config below for you to try (I kept your small
|
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
Hi,
finally we succeeded in training a nequip model with our data, and with energies and forces.
But maybe we found a bug in the nequip code, so please check carefully what follows.
In detail, the per-atom energy statistics for the whole dataset are: mean=-3460.8266742392325; std=0.16236037667479927. The same evaluated by the nequip are: dataset_per_atom_total_energy_mean=-22883.892153712808, dataset_per_atom_total_energy_std=62120.49343479028
So we debugged the code, and found this: in nequip/data/dataset.py, line 540, the per-atom energies are evaluated as arr / N.
Since the shape of the tensors is different (arr: [n_samples, 1]; N: [n_samples]), element-wise division is NOT performed: …