Skip to content

Dreambooth doesn't train on 8GB #807

Closed
@devilismyfriend

Description

@devilismyfriend

Describe the bug

Per the example featured in the repo, it goes OOM when DeepSpeed is loading the optimizer, tested on a 3080 10GB + 64GB RAM in WSL2 and native Linux.

Reproduction

Follow the pastebin for setup purposes (on WSL2), or just try it yourself https://pastebin.com/0NHA5YTP

Logs

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `8` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2022-10-11 17:16:38,700] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2022-10-11 17:16:48,338] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.3, git-hash=unknown, git-branch=unknown
[2022-10-11 17:16:50,220] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2022-10-11 17:16:50,221] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2022-10-11 17:16:50,221] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2022-10-11 17:16:50,271] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = {basic_optimizer.__class__.__name__}
[2022-10-11 17:16:50,272] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2022-10-11 17:16:50,272] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer
[2022-10-11 17:16:50,272] [INFO] [stage_1_and_2.py:134:__init__] Reduce bucket size 500000000
[2022-10-11 17:16:50,272] [INFO] [stage_1_and_2.py:135:__init__] Allgather bucket size 500000000
[2022-10-11 17:16:50,272] [INFO] [stage_1_and_2.py:136:__init__] CPU Offload: True
[2022-10-11 17:16:50,272] [INFO] [stage_1_and_2.py:137:__init__] Round robin gradient partitioning: False
Using /root/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/py39_cu113/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.15820693969726562 seconds
Rank: 0 partition count [1] and sizes[(859520964, False)]
[2022-10-11 17:16:52,613] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states
[2022-10-11 17:16:52,614] [INFO] [utils.py:828:see_memory_usage] MA 1.66 GB         Max_MA 1.66 GB         CA 3.27 GB         Max_CA 3 GB
[2022-10-11 17:16:52,614] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory:  used = 7.68 GB, percent = 16.3%
Traceback (most recent call last):
  File "/root/github/diffusers-ttl/examples/dreambooth/train_dreambooth.py", line 598, in <module>
    main()
  File "/root/github/diffusers-ttl/examples/dreambooth/train_dreambooth.py", line 478, in main
    unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/root/anaconda3/envs/diffusers-ttl/lib/python3.9/site-packages/accelerate/accelerator.py", line 679, in prepare
    result = self._prepare_deepspeed(*args)
  File "/root/anaconda3/envs/diffusers-ttl/lib/python3.9/site-packages/accelerate/accelerator.py", line 890, in _prepare_deepspeed
    engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/root/anaconda3/envs/diffusers-ttl/lib/python3.9/site-packages/deepspeed/__init__.py", line 124, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/root/anaconda3/envs/diffusers-ttl/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 320, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/root/anaconda3/envs/diffusers-ttl/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1144, in _configure_optimizer
    self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  File "/root/anaconda3/envs/diffusers-ttl/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1395, in _configure_zero_optimizer
    optimizer = DeepSpeedZeroOptimizer(
  File "/root/anaconda3/envs/diffusers-ttl/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 512, in __init__
    self.initialize_optimizer_states()
  File "/root/anaconda3/envs/diffusers-ttl/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 599, in initialize_optimizer_states
    i].grad = single_grad_partition.pin_memory(
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 59) of binary: /root/anaconda3/envs/diffusers-ttl/bin/python

System Info

3080 10GB + 64GB RAM, WSL2 and Linux

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstaleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions