Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffusion_pytorch_model.bin` Not Found in Expected Directory During Training with Dreambooth and followup errors #1487

Open
1 task done
TheRealDrCarbon opened this issue Sep 8, 2024 · 3 comments
Labels

Comments

@TheRealDrCarbon
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

What happened?

I encountered an error when training a model using the Juggernaut-XL_v9_RunDiffusionPhoto_v2 checkpoint in Dreambooth. The training fails with this error:

Exception training model: 'Error no file named diffusion_pytorch_model.bin found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_NEW\working

After checking, I found that the file diffusion_pytorch_model.bin is in:

C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working\vae

It appears the file is being placed in the vae subdirectory instead of the working directory. Manually copying the file to working lets the process continue, but new errors arise later (see below).

Expected Behavior:
The model should place files in the correct directories, or the system should look in the proper subdirectories.

Actual Behavior:
Files are created in the wrong subdirectory, causing training to fail due to missing files.

Workaround:
Manually copying the file allows partial progress but leads to further errors.

Additional Notes:

  • This issue happens across multiple checkpoint versions.
  • Manually copying files is only a partial solution as further errors appear.

Environment:

  • OS: Windows
  • Checkpoint: Juggernaut-XL_v9_RunDiffusionPhoto_v2
  • Dreambooth/Stable Diffusion Version: [Add relevant version details]

Error After Workaround:
[Include next error message if necessary.]

Steps to reproduce the problem

  1. Use the Juggernaut-XL_v9_RunDiffusionPhoto_v2 checkpoint for model training.
  2. Start training in Dreambooth with standard settings.
  3. Observe the error: diffusion_pytorch_model.bin not found in the expected path.

Commit and libraries

Command Line Arguments

no

Console logs

An error occurred while trying to fetch C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_XL\working: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working.
                                                                                                                       An error occurred while trying to fetch C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_XL\working: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working.
Traceback (most recent call last):
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 2003, in main
    return inner_loop()
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 481, in inner_loop
    unet = UNet2DConditionModel.from_pretrained(
  File "C:\Users\stefa\stable-diffusion-webui\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\stefa\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 740, in from_pretrained
    raise ValueError(
ValueError: Cannot load <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'> from C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working because the following keys are missing:
 up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.proj_out.weight, up_blocks.1.resnets.1.conv_shortcut.bias, down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias, down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight, up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias, up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias, up_blocks.0.resnets.0.conv2.weight, down_blocks.2.attentions.0.norm.bias, mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight, down_blocks.2.resnets.0.conv_shortcut.bias, up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight, up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias, up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight, up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight, up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight, up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight, up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight, up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight, down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight, mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight, down_blocks.2.resnets.0.time_emb_proj.bias, up_blocks.0.resnets.2.norm2.bias, up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias, up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias, up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight, up_blocks.1.resnets.0.conv_shortcut.bias, up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight, down_blocks.2.resnets.0.conv1.bias, up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn1.to_q.weight, up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight, up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight, up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias, up_blocks.0.resnets.2.time_emb_proj.weight, down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight, up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight, up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight, up_blocks.0.attentions.2.proj_in.bias, down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight,
 Please make sure to pass low_cpu_mem_usage=False and device_map=None if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
Loading unet...:  86%|█████████████████████████████████████████████████████████▍         | 6/7 [00:05<00:00,  1.07it/s]
Duration: 00:01:22
Duration: 00:01:23

Additional information

have to cut a large part of console logs regarding the lengh restriction of comment

Copy link

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days

@github-actions github-actions bot added the Stale label Sep 23, 2024
@mary-mark
Copy link

Has anyone else encountered this issue and solved it? I am using SDXL as the base

@github-actions github-actions bot removed the Stale label Oct 11, 2024
Copy link

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days

@github-actions github-actions bot added the Stale label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants