We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The DDP training stuck at the 1st iter, and it's always waiting for pid: os.waitpid() always return pid==0
v1.x
No response
# Error messages and logs here please
torch2.1.0+cuda12.1/11.8 pytorch-lightning==1.9.0/1.9.2 H100 x8
I try to set limit_train_batches=0.1, limit_val_batches=1 in Trainer() but it doesn't work.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Bug description
The DDP training stuck at the 1st iter, and it's always waiting for pid:
os.waitpid() always return pid==0
What version are you seeing the problem on?
v1.x
How to reproduce the bug
No response
Error messages and logs
Environment
torch2.1.0+cuda12.1/11.8
pytorch-lightning==1.9.0/1.9.2
H100 x8
More info
I try to set limit_train_batches=0.1, limit_val_batches=1 in Trainer() but it doesn't work.
The text was updated successfully, but these errors were encountered: