-
Notifications
You must be signed in to change notification settings - Fork 584
Labels
bugcriticalCritical bugs that may break the results without messagesCritical bugs that may break the results without messagesupstream
Description
Bug summary
Using the same DeePMD-kit code, TF v2.12.0 works fine, but TF v2.13.0 gives wrong GPU results for forces.
DeePMD-kit Version
v2.2.3.dev55+g37fd8d19
TensorFlow Version
2.13.0
How did you download the software?
Built from source
Input Files, Running Commands, Error Log, etc.
Test on examples/water/se_e2_a and compare lcurve.out.
TF v2.12.0 + GPU:
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
0 2.60e+01 2.61e+01 6.76e-01 6.76e-01 8.20e-01 8.23e-01 1.0e-03
100 1.18e+01 1.11e+01 1.90e-01 1.81e-01 3.73e-01 3.50e-01 1.0e-03
200 7.50e+00 7.34e+00 5.96e-02 5.33e-02 2.37e-01 2.32e-01 1.0e-03
TF v2.13.0 + GPU:
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
0 nan 1.08e+03 5.27e+01 6.76e-01 1.28e+06 3.41e+01 1.0e-03
100 3.20e+02 2.50e+02 5.24e-01 5.15e-01 1.01e+01 7.92e+00 1.0e-03
200 4.55e+03 5.28e+02 3.60e+01 2.73e-01 1.44e+02 1.67e+01 1.0e-03
TF v2.13.0 + CPU:
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
0 2.60e+01 2.61e+01 6.76e-01 6.76e-01 8.20e-01 8.23e-01 1.0e-03
100 1.18e+01 1.11e+01 1.90e-01 1.81e-01 3.73e-01 3.50e-01 1.0e-03
200 7.50e+00 7.34e+00 5.96e-02 5.33e-02 2.37e-01 2.32e-01 1.0e-03
TF v2.12.0 + GPU and TF v2.13.0 + CPU give the same results. The rmse_f_trn from TF v2.13.0 + GPU is wrong. I think the reason needs to be looked into.
Steps to Reproduce
Install:
pip install tensorflow==2.13.0
pip install -v .
Run examples:
cd examples/water/se_e2_a
dp train input.json
Further Information, Files, and Links
No response
Metadata
Metadata
Assignees
Labels
bugcriticalCritical bugs that may break the results without messagesCritical bugs that may break the results without messagesupstream