You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Encountered when using DDP. How should I locate the warning at this location?
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
/home/ps/anaconda3/envs/py38/lib/python3.8/site-packages/torch/autograd/init.py:197: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [32, 64, 1, 1, 1], strides() = [64, 1, 64, 64, 64]
bucket_view.sizes() = [32, 64, 1, 1, 1], strides() = [64, 1, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.)
The text was updated successfully, but these errors were encountered:
Here are some steps to locate and address the source of this warning:
Check Tensor Creation:
Ensure that the tensors are created and manipulated in a way that respects their memory layout. Avoid operations that may inadvertently change the tensor strides.
Verify DDP Initialization:
Make sure that the DDP module is initialized after all model parameters have been correctly set up and that no operations change the parameter strides after DDP initialization.
Consistent Tensor Manipulation:
Ensure that all operations on the tensors are consistent and do not change the underlying memory layout or strides. For example, avoid in-place operations that might alter the tensor's strides.
Use Contiguous Tensors:
If you suspect that the strides might have been changed, you can make tensors contiguous before passing them to the DDP module. You can do this by calling .contiguous() on the tensors before using them in the backward pass.
sample code snippet to make tensors contiguous:
for param in model.parameters():
param.grad = param.grad.contiguous()
Encountered when using DDP. How should I locate the warning at this location?
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
/home/ps/anaconda3/envs/py38/lib/python3.8/site-packages/torch/autograd/init.py:197: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [32, 64, 1, 1, 1], strides() = [64, 1, 64, 64, 64]
bucket_view.sizes() = [32, 64, 1, 1, 1], strides() = [64, 1, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.)
The text was updated successfully, but these errors were encountered: