Skip to content

load_lora_weight() doesn't work for sdxl lora trained model from https://github.com/TheLastBen/fast-stable-diffusion #4302

Closed
@MaxTran96

Description

@MaxTran96

Describe the bug

Hi i tried using TheLastBen runpod to lora trained a model from SDXL base 0.9. I then test ran that model on ComfyUI and it was able to generate inference just fine

but when i tried to do that via code

STABLE_DIFFUSION_SDXL = 'stabilityai/stable-diffusion-xl-base-0.9'
pipe = DiffusionPipeline.from_pretrained(
    STABLE_DIFFUSION_SDXL,
    torch_dtype=torch.float16,
    use_safetensors=True,
    safety_checker=None,
    variant='fp16'
).to('cuda')
pipe.load_lora_weights(".", weight_name=lora_path)

it returns

>>> pipe.load_lora_weights(".", weight_name=lora_path)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.local/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 857, in load_lora_weights
    self.load_lora_into_unet(state_dict, network_alpha=network_alpha, unet=self.unet)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/diffusers/loaders.py", line 1055, in load_lora_into_unet
    unet.load_attn_procs(unet_lora_state_dict, network_alpha=network_alpha)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/diffusers/loaders.py", line 364, in load_attn_procs
    raise ValueError(f"Module {key} is not a LoRACompatibleConv or LoRACompatibleLinear module.")
ValueError: Module down_blocks.1.attentions.0.proj_in is not a LoRACompatibleConv or LoRACompatibleLinear module.

i also tried using a custom code that i found from previous git issue in diffuser repo

def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
    LORA_PREFIX_UNET = "lora_unet"
    LORA_PREFIX_TEXT_ENCODER = "lora_te"
    # load LoRA weight from .safetensors
    state_dict = load_file(checkpoint_path, device=device)
    updates = defaultdict(dict)
    for key, value in state_dict.items():
        # it is suggested to print out the key, it usually will be something like below
        # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
        layer, elem = key.split('.', 1)
        updates[layer][elem] = value
    # directly update weight in diffusers model
    for layer, elems in updates.items():
        if "text" in layer:
            layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
            curr_layer = pipeline.text_encoder
        else:
            layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
            curr_layer = pipeline.unet
        # find the target layer
        temp_name = layer_infos.pop(0)
        while len(layer_infos) > -1:
            try:
                curr_layer = curr_layer.__getattr__(temp_name)
                if len(layer_infos) > 0:
                    temp_name = layer_infos.pop(0)
                elif len(layer_infos) == 0:
                    break
            except Exception:
                if len(temp_name) > 0:
                    temp_name += "_" + layer_infos.pop(0)
                else:
                    temp_name = layer_infos.pop(0)
        # get elements for this layer
        weight_up = elems['lora_up.weight'].to(dtype)
        weight_down = elems['lora_down.weight'].to(dtype)
        alpha = elems['alpha']
        if alpha:
            alpha = alpha.item() / weight_up.shape[1]
        else:
            alpha = 1.0
        # update weight
        if len(weight_up.shape) == 4:
            curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
        else:
            curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
    return pipeline
pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float16)

it's able to load but when i ran

positive_prompt = 'photo of nasdxl'
negative_prompt = '(worst quality, low quality:1.4),deformed, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, disgusting, poorly drawn hands, missing limb, floating limbs, disconnected limbs, malformed hands, blurry, ((((mutated hands and fingers)))), watermark, watermarked, oversaturated, censored, distorted hands, amputation, missing hands, obese, doubled face, double hands,(((missing arms))),(((missing legs))), (((extra arms))),(((extra legs))), badhandsv5, badhandv4, deepnegative'
images = pipe(
    prompt=positive_prompt,
    negative_prompt=negative_prompt,
    generator=torch.Generator(device='cuda').manual_seed(111111)
).images

it generated images that don't contain my trained subject. Is this a bug? The inference for this model works just fine on ComfyUI. I noticed that it has a k_sampler node, do i need to process the model with k_sampler so that it can correctly generate image that contain my trained subject?

Reproduction

Trained a model using https://github.com/TheLastBen/fast-stable-diffusion
and ran the inference code i provided above

Logs

No response

System Info

AWS EC2 g4.4xlarge

Who can help?

@sayakpaul @patri

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions