Description
Describe the bug
Hi i tried using TheLastBen runpod to lora trained a model from SDXL base 0.9. I then test ran that model on ComfyUI and it was able to generate inference just fine
but when i tried to do that via code
STABLE_DIFFUSION_SDXL = 'stabilityai/stable-diffusion-xl-base-0.9'
pipe = DiffusionPipeline.from_pretrained(
STABLE_DIFFUSION_SDXL,
torch_dtype=torch.float16,
use_safetensors=True,
safety_checker=None,
variant='fp16'
).to('cuda')
pipe.load_lora_weights(".", weight_name=lora_path)
it returns
>>> pipe.load_lora_weights(".", weight_name=lora_path)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/.local/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 857, in load_lora_weights
self.load_lora_into_unet(state_dict, network_alpha=network_alpha, unet=self.unet)
File "/home/ubuntu/.local/lib/python3.8/site-packages/diffusers/loaders.py", line 1055, in load_lora_into_unet
unet.load_attn_procs(unet_lora_state_dict, network_alpha=network_alpha)
File "/home/ubuntu/.local/lib/python3.8/site-packages/diffusers/loaders.py", line 364, in load_attn_procs
raise ValueError(f"Module {key} is not a LoRACompatibleConv or LoRACompatibleLinear module.")
ValueError: Module down_blocks.1.attentions.0.proj_in is not a LoRACompatibleConv or LoRACompatibleLinear module.
i also tried using a custom code that i found from previous git issue in diffuser repo
def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
LORA_PREFIX_UNET = "lora_unet"
LORA_PREFIX_TEXT_ENCODER = "lora_te"
# load LoRA weight from .safetensors
state_dict = load_file(checkpoint_path, device=device)
updates = defaultdict(dict)
for key, value in state_dict.items():
# it is suggested to print out the key, it usually will be something like below
# "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
layer, elem = key.split('.', 1)
updates[layer][elem] = value
# directly update weight in diffusers model
for layer, elems in updates.items():
if "text" in layer:
layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
curr_layer = pipeline.text_encoder
else:
layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
curr_layer = pipeline.unet
# find the target layer
temp_name = layer_infos.pop(0)
while len(layer_infos) > -1:
try:
curr_layer = curr_layer.__getattr__(temp_name)
if len(layer_infos) > 0:
temp_name = layer_infos.pop(0)
elif len(layer_infos) == 0:
break
except Exception:
if len(temp_name) > 0:
temp_name += "_" + layer_infos.pop(0)
else:
temp_name = layer_infos.pop(0)
# get elements for this layer
weight_up = elems['lora_up.weight'].to(dtype)
weight_down = elems['lora_down.weight'].to(dtype)
alpha = elems['alpha']
if alpha:
alpha = alpha.item() / weight_up.shape[1]
else:
alpha = 1.0
# update weight
if len(weight_up.shape) == 4:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
else:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
return pipeline
pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float16)
it's able to load but when i ran
positive_prompt = 'photo of nasdxl'
negative_prompt = '(worst quality, low quality:1.4),deformed, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, disgusting, poorly drawn hands, missing limb, floating limbs, disconnected limbs, malformed hands, blurry, ((((mutated hands and fingers)))), watermark, watermarked, oversaturated, censored, distorted hands, amputation, missing hands, obese, doubled face, double hands,(((missing arms))),(((missing legs))), (((extra arms))),(((extra legs))), badhandsv5, badhandv4, deepnegative'
images = pipe(
prompt=positive_prompt,
negative_prompt=negative_prompt,
generator=torch.Generator(device='cuda').manual_seed(111111)
).images
it generated images that don't contain my trained subject. Is this a bug? The inference for this model works just fine on ComfyUI. I noticed that it has a k_sampler node, do i need to process the model with k_sampler so that it can correctly generate image that contain my trained subject?
Reproduction
Trained a model using https://github.com/TheLastBen/fast-stable-diffusion
and ran the inference code i provided above
Logs
No response
System Info
AWS EC2 g4.4xlarge