-
Notifications
You must be signed in to change notification settings - Fork 584
Closed
Labels
bugcriticalCritical bugs that may break the results without messagesCritical bugs that may break the results without messagesreproducedThis bug has been reproduced by developersThis bug has been reproduced by developersupstream
Description
I use the water data in the examples folder to train the model. When using this model for lammps MD inference, the results of lammps were abnormal. This problem did not occur in the previous bate version. I used the same model to get different results in the release version and the previous bate version. Below are my training input files, dp test output, and lammps output.
training input.json
{
"_comment": " model parameters",
"model": {
"type_map": ["O", "H"],
"descriptor" :{
"type": "se_e2_a",
"sel": [46, 92],
"rcut_smth": 0.50,
"rcut": 6.00,
"neuron": [32, 64, 128],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"_comment": " that's all"
},
"fitting_net" : {
"neuron": [240, 240, 240],
"resnet_dt": true,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},
"learning_rate" :{
"type": "exp",
"decay_steps": 5000,
"start_lr": 0.001,
"stop_lr": 3.51e-8,
"_comment": "that's all"
},
"loss" :{
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": " that's all"
},
"training" : {
"training_data": {
"systems": ["../data/data_0/", "../data/data_1/", "../data/data_2/"],
"batch_size": "auto",
"_comment": "that's all"
},
"validation_data":{
"systems": ["../data/data_3"],
"batch_size": 1,
"numb_btch": 3,
"_comment": "that's all"
},
"numb_steps": 100000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 1000,
"_comment": "that's all"
},
"_comment": "that's all"
}test result
DEEPMD INFO # number of test data : 1
DEEPMD INFO Energy RMSE : 7.472860e-02 eV
DEEPMD INFO Energy RMSE/Natoms : 3.892115e-04 eV
DEEPMD INFO Force RMSE : 5.845049e-02 eV/A
DEEPMD INFO Virial RMSE : 6.561497e+00 eV
DEEPMD INFO Virial RMSE/Natoms : 3.417446e-02 eVlammps result
LAMMPS (30 Jul 2021)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
# bulk water
units metal
boundary p p p
atom_style atomic
neighbor 2.0 bin
neigh_modify every 50 delay 0 check no
read_data ../lmp/water.lmp
Reading data file ...
triclinic box = (0.0000000 0.0000000 0.0000000) to (12.444700 12.444700 12.444700) with tilt (0.0000000 0.0000000 0.0000000)
1 by 1 by 1 MPI processor grid
reading atoms ...
192 atoms
read_data CPU = 0.001 seconds
mass 1 16
mass 2 2
replicate 1 1 1
Replicating atoms ...
triclinic box = (0.0000000 0.0000000 0.0000000) to (12.444700 12.444700 12.444700) with tilt (0.0000000 0.0000000 0.0000000)
1 by 1 by 1 MPI processor grid
192 atoms
replicate CPU = 0.001 seconds
# load the plugin at <install_prefix>/lib/libdeepmd_lmp.so
plugin load ../../../dp/lib/libdeepmd_lmp.so
2021-09-06 05:07:15.880999: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Loading plugin: deepmd pair style v2.0 by Han Wang
Loading plugin: compute deeptensor/atom v2.0 by Han Wang
Loading plugin: fix dplr v2.0 by Han Wang
pair_style deepmd ../../../model/water/graph-original.pb
Summary of lammps deepmd module ...
>>> Info of deepmd-kit:
installed to: /home/user/deepmd-kit/dp
source: v2.0.0
source branch: HEAD
source commit: 1a25414
source commit at: 2021-08-28 08:15:38 +0800
surpport model ver.:1.0
build float prec: double
build with tf inc: /home/user/software/tensorflow-gpu-2.4/include;/home/user/software/tensorflow-gpu-2.4/include
build with tf lib: /home/user/software/tensorflow-gpu-2.4/lib/libtensorflow_cc.so;/home/user/software/tensorflow-gpu-2.4/lib/libtensorflow_framework.so
set tf intra_op_parallelism_threads: 0
set tf inter_op_parallelism_threads: 0
>>> Info of lammps module:
use deepmd-kit at: /home/user/deepmd-kit/dp
source: v2.0.0
source branch: HEAD
source commit: 1a25414
source commit at: 2021-08-28 08:15:38 +0800
build float prec: double
build with tf inc: /home/user/software/tensorflow-gpu-2.4/include;/home/user/software/tensorflow-gpu-2.4/include
build with tf lib: /home/user/software/tensorflow-gpu-2.4/lib/libtensorflow_cc.so;/home/user/software/tensorflow-gpu-2.4/lib/libtensorflow_framework.so
2021-09-06 05:07:15.944785: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-06 05:07:16.074100: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-06 05:07:16.083678: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-06 05:07:16.083750: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-09-06 05:07:16.086385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:3b:00.0 name: A100-PCIE-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s
2021-09-06 05:07:16.086427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-06 05:07:16.093307: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-06 05:07:16.093349: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-09-06 05:07:16.094544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-06 05:07:16.094890: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-06 05:07:16.095777: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-09-06 05:07:16.096861: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-06 05:07:16.097030: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-06 05:07:16.101095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-09-06 05:07:17.603292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-06 05:07:17.603340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-09-06 05:07:17.603350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-09-06 05:07:17.609755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 36482 MB memory) -> physical GPU (device: 0, name: A100-PCIE-40GB, pci bus id: 0000:3b:00.0, compute capability: 8.0)
2021-09-06 05:07:17.661374: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3000000000 Hz
>>> Info of model(s):
using 1 model(s): ../../../model/water/graph-original.pb
rcut in model: 6
ntypes in model: 2
pair_coeff * *
velocity all create 330.0 23456789
fix 1 all nvt temp 330.0 330.0 0.5
timestep 0.0005
thermo_style custom step pe ke etotal temp press vol
thermo 20
# dump 1 all custom 100 water.dump id type x y z
run 99
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Neighbor list info ...
update every 50 steps, delay 0 steps, check no
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 8
ghost atom cutoff = 8
binsize = 4, bins = 4 4 4
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair deepmd, perpetual
attributes: , newton on
pair build: full/bin/atomonly
stencil: half/bin/3d/tri
bin: standard
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.0005
2021-09-06 05:07:18.718134: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-06 05:07:21.396912: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
Per MPI rank memory allocation (min/avg/max) = 2.650 | 2.650 | 2.650 Mbytes
Step PotEng KinEng TotEng Temp Press Volume
0 -29915.087 8.1472669 -29906.94 330 3350.9587 1927.3176
20 -29917.854 10.912417 -29906.942 442.00069 -10514.011 1927.3176
40 -29921.478 14.535179 -29906.943 588.73844 -6586.5916 1927.3176
60 -29896.586 22.764435 -29873.822 922.05934 5369.5048 1927.3176
80 -29903.143 29.254954 -29873.889 1184.9538 5065.801 1927.3176
99 -29907.677 33.63073 -29874.046 1362.1919 6240.4274 1927.3176
Loop time of 0.893113 on 1 procs for 99 steps with 192 atoms
Performance: 4.789 ns/day, 5.012 hours/ns, 110.848 timesteps/s
96.5% CPU use with 1 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.88632 | 0.88632 | 0.88632 | 0.0 | 99.24
Neigh | 0.0024072 | 0.0024072 | 0.0024072 | 0.0 | 0.27
Comm | 0.0020397 | 0.0020397 | 0.0020397 | 0.0 | 0.23
Output | 0.00038211 | 0.00038211 | 0.00038211 | 0.0 | 0.04
Modify | 0.0014147 | 0.0014147 | 0.0014147 | 0.0 | 0.16
Other | | 0.0005449 | | | 0.06
Nlocal: 192.000 ave 192 max 192 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 2094.00 ave 2094 max 2094 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 0.00000 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Total # of neighbors = 0
Ave neighs/atom = 0.0000000
Neighbor list builds = 1
Dangerous builds not checked
Total wall time: 0:00:06
njzjz
Metadata
Metadata
Assignees
Labels
bugcriticalCritical bugs that may break the results without messagesCritical bugs that may break the results without messagesreproducedThis bug has been reproduced by developersThis bug has been reproduced by developersupstream