Skip to content

[BUG] pt: setting batch_size to list throws errors #3475

@njzjz

Description

@njzjz

Bug summary

In the PyTorch backend, setting batch_size to list throws errors as shown below.

DeePMD-kit Version

v3.0.0a0-28-ged831c88

TensorFlow Version

PT v2.2.0+cu121-g8ac9b20d4b0

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

Traceback (most recent call last):
  File "/home/jz748/anaconda3/bin/dp", line 8, in <module>
    sys.exit(main())
  File "/home/jz748/codes/deepmd-kit/deepmd/main.py", line 807, in main
    deepmd_main(args)
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 306, in main
    train(FLAGS)
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 270, in train
    trainer = get_trainer(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 166, in get_trainer
    ) = prepare_trainer_input_single(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 149, in prepare_trainer_input_single
    train_data_single = DpLoaderSet(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/utils/dataloader.py", line 129, in __init__
    system_dataloader = DataLoader(
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 356, in __init__
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 267, in __init__
    raise ValueError(f"batch_size should be a positive integer value, but got batch_size={batch_size}")
ValueError: batch_size should be a positive integer value, but got batch_size=[1, 1, 1]

Steps to Reproduce

cd examples/water/se_atten

Do the following modifications:

diff --git a/examples/water/se_atten/input_torch.json b/examples/water/se_atten/input_torch.json
index 7e9cf06f..0188228e 100644
--- a/examples/water/se_atten/input_torch.json
+++ b/examples/water/se_atten/input_torch.json
@@ -68,7 +68,7 @@
         "../data/data_1",
         "../data/data_2"
       ],
-      "batch_size": 1,
+      "batch_size": [1, 1, 1],
       "_comment": "that's all"
     },
     "validation_data": {

Then run

dp --pt train input_torch.json

Further Information, Files, and Links

Need to update documentation if it cannot be resolved before the stable release.
https://docs.deepmodeling.com/projects/deepmd/en/latest/train/train-input.html#argument:training/training_data/batch_size

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions