-
Notifications
You must be signed in to change notification settings - Fork 335
Open
Description
Dear DiffDock developers,
currently the webserver at huggingface fails every run with a cuda out-of-memory issue. Could you please have a look?
Below is an example of the typical error report.
Best regards,
Juergen
Standard out:
Generating ESM language model embeddings
Processing 1 of 1 batches (1 sequences)
Standard error:
libgomp: Invalid value for environment variable OMP_NUM_THREADS
/home/appuser/micromamba/envs/diffdock/lib/python3.9/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
warnings.warn(
[2025-Sep-01 14:57:08 UTC] [inference.py:153] INFO - DiffDock will run on cuda
[2025-Sep-01 14:57:19 UTC] [inference.py:184] INFO - Confidence model uses different type of graphs than the score model. Loading (or creating if not existing) the data for the confidence model now.
[2025-Sep-01 14:57:28 UTC] [inference.py:223] INFO - Size of test dataset: 1
0it [00:00, ?it/s][2025-Sep-01 14:57:40 UTC] [process_mols.py:309] DEBUG - rdkit coords could not be generated. trying again 1.
[2025-Sep-01 14:57:52 UTC] [process_mols.py:309] DEBUG - rdkit coords could not be generated. trying again 2.
[2025-Sep-01 14:58:04 UTC] [process_mols.py:313] INFO - rdkit coords could not be generated without using random coords. using random coords now.
/home/appuser/DiffDock/datasets/parse_chi.py:91: RuntimeWarning: invalid value encountered in cast
Y = indices.astype(int)
[2025-Sep-01 14:59:44 UTC] [process_mols.py:309] DEBUG - rdkit coords could not be generated. trying again 1.
[2025-Sep-01 14:59:56 UTC] [process_mols.py:309] DEBUG - rdkit coords could not be generated. trying again 2.
[2025-Sep-01 15:00:08 UTC] [process_mols.py:313] INFO - rdkit coords could not be generated without using random coords. using random coords now.
--- Logging error ---
Traceback (most recent call last):
File "/home/appuser/DiffDock/inference.py", line 260, in main
data_list, confidence = sampling(data_list=data_list, model=model,
File "/home/appuser/DiffDock/utils/sampling.py", line 116, in sampling
tr_score, rot_score, tor_score = model(mod_complex_graph_batch)[:3]
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/DiffDock/models/cg_model.py", line 345, in forward
node_attr = self.conv_layers[l](node_attr, edge_index, edge_attr_, edge_sh, edge_weight=edge_weight)
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/DiffDock/models/tensor_layers.py", line 321, in forward
out = tp_scatter_multigroup(self.tp, self.fc, node_attr, edge_index, edge_attr, edge_sh,
File "/home/appuser/DiffDock/models/tensor_layers.py", line 211, in tp_scatter_multigroup
cur_out_irreps = cur_fc(edge_attr_groups[ii])
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.96 GiB (GPU 0; 14.74 GiB total capacity; 2.62 GiB already allocated; 11.61 GiB free; 2.76 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/logging/__init__.py", line 1083, in emit
msg = self.format(record)
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/logging/__init__.py", line 927, in format
return fmt.format(record)
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/logging/__init__.py", line 663, in format
record.message = record.getMessage()
File "/home/appuser/micromamba/envs/diffdock/lib/python3.9/logging/__init__.py", line 367, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/appuser/DiffDock/inference.py", line 318, in <module>
main(_args)
File "/home/appuser/DiffDock/inference.py", line 302, in main
logger.warning("Failed on", orig_complex_graph["name"], e)
Message: 'Failed on'
Arguments: (['complex_0'], OutOfMemoryError('CUDA out of memory. Tried to allocate 14.96 GiB (GPU 0; 14.74 GiB total capacity; 2.62 GiB already allocated; 11.61 GiB free; 2.76 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF'))
1it [03:16, 196.77s/it]
1it [03:16, 196.77s/it]
[2025-Sep-01 15:00:45 UTC] [inference.py:310] WARNING -
Failed for 1 / 1 complexes.
Skipped 0 / 1 complexes.
[2025-Sep-01 15:00:45 UTC] [inference.py:313] INFO - Results saved in /tmp/tmp_ykwbabp
Metadata
Metadata
Assignees
Labels
No labels