Description
I've been trying to figure out how to write a benchmarking script for SchNet. Here's what I have so far with SchNetPack. It loads a PDB file and computes the energy 1000 times with one of the pre-trained QM9 models. I haven't figured out yet how to get it to compute forces, so any advice on that would be appreciated. There probably are other ways this could be improved too.
import torch
import schnetpack as spk
import schnetpack.md.calculators
import sys
import ase.io
import time
device = torch.device('cuda')
model = torch.load("trained_schnet_models/qm9_energy_U0/best_model", map_location=device)
atoms = ase.io.read(sys.argv[1])
system = spk.md.System(1, device=device)
system.load_molecules([atoms])
calculator = spk.md.calculators.SchnetPackCalculator(
model,
required_properties=['energy_U0'],
force_handle=spk.Properties.forces,
position_conversion='A',
force_conversion='kcal/mol/A'
)
inputs = calculator._generate_input(system)
model(inputs)
t1 = time.time()
for i in range(1000):
results = model(inputs)
print(results)
print(time.time()-t1)
Testing a 60 atom system on a Titan V, it takes about 3.6 ms per energy evaluation. Testing a 2269 atom system it runs out of memory on the GPU and crashes.
While the test is running, nvidia-smi
shows that the GPU is only 28% busy. nvvp
shows a lot of short kernels with larger gaps between them. The two most significant kernels are volta_sgemm_32x128_tn
(19.8% of GPU time) and volta_sgemm_32x32_sliced1x4_tn
(16% of GPU time). It then gets into a whole lot of kernels with uninformative names like _ZN2at6native6legacy18elementwise_kernelILi128ELi4EZNS0_15gpu_kernel_implIZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_EEvS5_RKT_EUliE2_EEviT1_
.