[BUG]  train_text_to_image_lora.py not support Multi-nodes or Multi-gpus training.

In [`train_text_to_image_lora.py`](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py), I notice that the LORA parameters are extracted into an `AttnProcsLayers` class:
```
518    lora_layers = AttnProcsLayers(unet.attn_processors)
```
And it is only the `lora_layers` that is wrapped by DistributedDataParallel in the following code:
```
670    lora_layers, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
            lora_layers, optimizer, train_dataloader, lr_scheduler
          )
```
In the training process, it seems that the `lora_layers` are not explicitly used but only the `unet` is used:
```
776    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
```

My question is that when using Multi-GPUs or Multi-Machines, will the gradients be successfully averaged across all processes in the above way? 

It is true that in each process, the gradients will be backward to `unet.attn_processors`, and these gradients will be shared by `lora_layers`, so we can use `optimizer` to update the weights. However, since we actually use `unet.attn_processors` to do the forward operation, but not the wrapped `lora_layers`, can the gradients be correctly averaged? From [here](https://github.com/pytorch/pytorch/blob/e095716161f7133f2af79d69461b92aa8748fc9b/torch/nn/parallel/distributed.py#L1507), it seems that a wrapped module will have a different forward compared to its original forward operation.

I am not quite familiar with `torch.nn.parallel.DistributedDataParallel` wrapper, and I do worry about whether the current code in `train_text_to_image_lora.py` will lead to different LORA weights in different processes (if the gradients failed to broadcast among processes).

Hope to find some help here, thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] train_text_to_image_lora.py not support Multi-nodes or Multi-gpus training. #4046

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] train_text_to_image_lora.py not support Multi-nodes or Multi-gpus training. #4046

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions