Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vis problem when T2V Training with distributed training #139

Open
lky-ang opened this issue Jul 30, 2024 · 0 comments
Open

Vis problem when T2V Training with distributed training #139

lky-ang opened this issue Jul 30, 2024 · 0 comments

Comments

@lky-ang
Copy link

lky-ang commented Jul 30, 2024

During the distributed training process of the t2v model, my sample cannot be generated, and the following mismatch problem occurs:

Traceback (most recent call last):
File "/VGen/tools/train/train_t2v_enterance.py", line 287, in worker
visual_func.run(visual_kwards=visual_kwards, **input_kwards)
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/VGen/tools/hooks/visual_train_it2v_video.py", line 62, in run
video_data = self.diffusion.ddim_sample_loop(
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/VGen/tools/modules/diffusions/diffusion_ddim.py", line 253, in ddim_sample_loop
xt, _ = self.ddim_sample(xt, t, model, model_kwargs, clamp, percentile, condition_fn, guide_scale, ddim_timesteps, eta)
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/VGen/tools/modules/diffusions/diffusion_ddim.py", line 217, in ddim_sample
_, _, _, x0 = self.p_mean_variance(xt, t, model, model_kwargs, clamp, percentile, guide_scale)
File "/VGen/tools/modules/diffusions/diffusion_ddim.py", line 158, in p_mean_variance
u_out = model(xt, self._scale_timesteps(t), **model_kwargs[1])
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/VGen/tools/modules/unet/unet_t2v.py", line 251, in forward
context = torch.cat([context, y_context], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 32 for tensor number 1 in the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant