Diffusion_pytorch_model.bin` Not Found in Expected Directory During Training with Dreambooth and followup errors #1487

TheRealDrCarbon · 2024-09-08T20:20:31Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

What happened?

I encountered an error when training a model using the Juggernaut-XL_v9_RunDiffusionPhoto_v2 checkpoint in Dreambooth. The training fails with this error:

Exception training model: 'Error no file named diffusion_pytorch_model.bin found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_NEW\working

After checking, I found that the file diffusion_pytorch_model.bin is in:

C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working\vae

It appears the file is being placed in the vae subdirectory instead of the working directory. Manually copying the file to working lets the process continue, but new errors arise later (see below).

Expected Behavior:
The model should place files in the correct directories, or the system should look in the proper subdirectories.

Actual Behavior:
Files are created in the wrong subdirectory, causing training to fail due to missing files.

Workaround:
Manually copying the file allows partial progress but leads to further errors.

Additional Notes:

This issue happens across multiple checkpoint versions.
Manually copying files is only a partial solution as further errors appear.

Environment:

OS: Windows
Checkpoint: Juggernaut-XL_v9_RunDiffusionPhoto_v2
Dreambooth/Stable Diffusion Version: [Add relevant version details]

Error After Workaround:
[Include next error message if necessary.]

Steps to reproduce the problem

Use the Juggernaut-XL_v9_RunDiffusionPhoto_v2 checkpoint for model training.
Start training in Dreambooth with standard settings.
Observe the error: diffusion_pytorch_model.bin not found in the expected path.

Commit and libraries

Command Line Arguments

no

Console logs

An error occurred while trying to fetch C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_XL\working: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working.
                                                                                                                       An error occurred while trying to fetch C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_XL\working: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working.
Traceback (most recent call last):
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 2003, in main
    return inner_loop()
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 481, in inner_loop
    unet = UNet2DConditionModel.from_pretrained(
  File "C:\Users\stefa\stable-diffusion-webui\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\stefa\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 740, in from_pretrained
    raise ValueError(
ValueError: Cannot load <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'> from C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working because the following keys are missing:
 up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.proj_out.weight, up_blocks.1.resnets.1.conv_shortcut.bias, down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias, down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight, up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias, up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias, up_blocks.0.resnets.0.conv2.weight, down_blocks.2.attentions.0.norm.bias, mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight, down_blocks.2.resnets.0.conv_shortcut.bias, up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight, up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias, up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight, up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight, up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight, up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight, up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight, up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight, down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight, mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight, down_blocks.2.resnets.0.time_emb_proj.bias, up_blocks.0.resnets.2.norm2.bias, up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias, up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias, up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight, up_blocks.1.resnets.0.conv_shortcut.bias, up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight, down_blocks.2.resnets.0.conv1.bias, up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn1.to_q.weight, up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight, up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight, up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias, up_blocks.0.resnets.2.time_emb_proj.weight, down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight, up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight, up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight, up_blocks.0.attentions.2.proj_in.bias, down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight,
 Please make sure to pass low_cpu_mem_usage=False and device_map=None if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
Loading unet...:  86%|█████████████████████████████████████████████████████████▍         | 6/7 [00:05<00:00,  1.07it/s]
Duration: 00:01:22
Duration: 00:01:23

Additional information

have to cut a large part of console logs regarding the lengh restriction of comment

The text was updated successfully, but these errors were encountered:

github-actions · 2024-09-23T00:43:22Z

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days

mary-mark · 2024-10-10T01:01:07Z

Has anyone else encountered this issue and solved it? I am using SDXL as the base

github-actions · 2024-10-25T00:43:39Z

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days

github-actions bot added the Stale label Sep 23, 2024

github-actions bot removed the Stale label Oct 11, 2024

github-actions bot added the Stale label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffusion_pytorch_model.bin` Not Found in Expected Directory During Training with Dreambooth and followup errors #1487

Diffusion_pytorch_model.bin` Not Found in Expected Directory During Training with Dreambooth and followup errors #1487

TheRealDrCarbon commented Sep 8, 2024

github-actions bot commented Sep 23, 2024

mary-mark commented Oct 10, 2024

github-actions bot commented Oct 25, 2024

Diffusion_pytorch_model.bin` Not Found in Expected Directory During Training with Dreambooth and followup errors #1487

Diffusion_pytorch_model.bin` Not Found in Expected Directory During Training with Dreambooth and followup errors #1487

Comments

TheRealDrCarbon commented Sep 8, 2024

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

Commit and libraries

Command Line Arguments

Console logs

Additional information

github-actions bot commented Sep 23, 2024

mary-mark commented Oct 10, 2024

github-actions bot commented Oct 25, 2024