-
Notifications
You must be signed in to change notification settings - Fork 584
Description
Bug summary
I was trying out the DPLR training example in examples/water/dplr/train, when I came into the following problem. I follwed the instruction in the online doc, i. e., first train Deep Wannier network with dp train dw.json && dp freeze -o dw.pb command. Next, train the energy network with dp train ener.json command.
The training of the Deep Wannier network works well, however, when training the energy model with the ener.json file, the force RMSE kept rising like the lcurve.out file as follows.
# step rmse_trn rmse_e_trn rmse_f_trn lr
0 2.58e+01 1.53e-01 8.15e-01 1.0e-03
100 1.49e+01 3.80e-02 6.27e-01 5.6e-04
200 2.28e+01 5.10e-02 1.28e+00 3.2e-04
300 2.99e+01 1.12e-01 2.23e+00 1.8e-04
400 2.66e+01 6.64e-02 2.65e+00 1.0e-04
500 2.52e+01 6.81e-04 3.34e+00 5.6e-05
600 1.69e+01 9.14e-03 2.95e+00 3.2e-05
700 2.06e+01 4.41e-02 4.76e+00 1.8e-05
800 1.39e+01 3.82e-02 4.20e+00 1.0e-05
900 1.20e+01 5.22e-02 4.64e+00 5.6e-06
1000 1.61e+01 4.17e-02 7.86e+00 3.2e-06
1100 1.48e+01 4.30e-02 8.87e+00 1.8e-06
1200 8.84e+00 3.74e-02 6.23e+00 1.0e-06
1300 9.89e+00 6.62e-03 7.91e+00 5.6e-07
1400 1.32e+01 7.61e-02 1.14e+01 3.2e-07
1500 1.00e+01 6.95e-03 9.21e+00 1.8e-07
1600 1.10e+01 5.02e-02 1.05e+01 1.0e-07
1700 9.08e+00 4.07e-03 8.83e+00 5.6e-08
1800 1.48e+01 1.25e-01 1.44e+01 3.2e-08
1900 1.46e+01 1.59e-01 1.41e+01 1.8e-08
2000 1.28e+01 9.54e-02 1.26e+01 1.0e-08
I also tried out the same example on 2.2.0, 2.1.5, and 2.1.0. v2.2.0 gives the same result as above. v2.1.5 and v2.1.0 gives the following result, which looks more reasonable:
# step rmse_trn rmse_e_trn rmse_f_trn lr
0 2.58e+01 1.53e-01 8.15e-01 1.0e-03
100 1.39e+01 1.48e-01 5.79e-01 5.6e-04
200 8.65e+00 6.18e-02 4.82e-01 3.2e-04
300 5.54e+00 4.16e-04 4.14e-01 1.8e-04
400 3.78e+00 2.64e-02 3.73e-01 1.0e-04
500 2.77e+00 2.12e-03 3.66e-01 5.6e-05
600 2.19e+00 8.75e-03 3.82e-01 3.2e-05
700 1.66e+00 5.43e-03 3.83e-01 1.8e-05
800 1.37e+00 8.95e-03 4.11e-01 1.0e-05
900 1.19e+00 1.16e-02 4.54e-01 5.6e-06
1000 8.69e-01 4.41e-03 4.24e-01 3.2e-06
1100 7.20e-01 2.93e-03 4.31e-01 1.8e-06
1200 5.34e-01 2.68e-03 3.76e-01 1.0e-06
1300 4.79e-01 4.74e-03 3.76e-01 5.6e-07
1400 5.11e-01 1.14e-02 4.01e-01 3.2e-07
1500 4.73e-01 3.63e-03 4.31e-01 1.8e-07
1600 4.71e-01 3.75e-03 4.44e-01 1.0e-07
1700 3.86e-01 9.01e-03 3.34e-01 5.6e-08
1800 3.83e-01 9.08e-03 3.34e-01 3.2e-08
1900 3.74e-01 9.85e-04 3.70e-01 1.8e-08
2000 3.86e-01 3.99e-03 3.76e-01 1.0e-08
The two results differ largely on the RMSE of forces, so I suspect that a bug was introduced between v2.1.5 and v2.2.0. If not so, I wonder why the resulting RMSE becomes so different between the new and old versions.
DeePMD-kit Version
2.2.4, 2.2.0, 2.1.5, 2.1.0
TensorFlow Version
Default version in the offline packages
How did you download the software?
Offline packages
Input Files, Running Commands, Error Log, etc.
Described as above.
Steps to Reproduce
Run examples/water/dplr/train case as the documentation directs.
Further Information, Files, and Links
No response