Closed
Description
I am training a LOHA with a 3090. When I try to start the training, it gives the following error:
CUDA SETUP: Loading binary D:\stable-diffusion\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit AdamW optimizer | {}
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 800
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 50
num epochs / epoch数: 30
batch size per device / バッチサイズ: 8
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1500
steps: 0%| | 0/1500 [00:00<?, ?it/s]epoch 1/30
Traceback (most recent call last):
File "D:\stable-diffusion\kohya_ss\train_network.py", line 699, in <module>
train(args)
File "D:\stable-diffusion\kohya_ss\train_network.py", line 538, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 407, in forward
sample = upsample_block(
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1203, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\diffusers\models\attention.py", line 216, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\diffusers\models\attention.py", line 494, in forward
hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\diffusers\models\attention.py", line 709, in forward
hidden_states = module(hidden_states)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\diffusers\models\attention.py", line 756, in forward
return hidden_states * self.gelu(gate)
RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 24.00 GiB total capacity; 8.54 GiB already allocated; 13.28 GiB free; 8.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0%| | 0/1500 [00:26<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\James\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\James\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\stable-diffusion\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\stable-diffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\stable-diffusion\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/stable-diffusion/lib/models/anythingV5Anything_anythingV5PrtRE.safetensors', '--train_data_dir=D:\\stable-diffusion\\training\\2023-03-26 kore zombie ts\\src3', '--resolution=512,512', '--output_dir=D:\\stable-diffusion\\training\\2023-03-26 kore zombie ts\\trn_v3', '--logging_dir=', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=lycoris.kohya', '--network_args', 'conv_dim=1', 'conv_alpha=1', 'algo=loha', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=korets_v3.0', '--lr_scheduler_num_cycles=30', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=150', '--train_batch_size=8', '--max_train_steps=1500', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--sample_sampler=euler_a', '--sample_prompts=D:\\stable-diffusion\\training\\2023-03-26 kore zombie ts\\trn_v3\\sample\\prompt.txt', '--sample_every_n_epochs=1']' returned non-zero exit status 1.
In particular, it says: Tried to allocate 30.00 MiB (GPU 0; 24.00 GiB total capacity; 8.54 GiB already allocated; 13.28 GiB free; 8.73 GiB reserved in total by PyTorch)
. I'm not sure if this is a bug or not? It's saying there's more free memory than it is trying to allocate yet fails to allocate it.
Metadata
Assignees
Labels
No labels