Skip to content

Black image with Vulkan + SD3 medium #560

@olivbrau

Description

@olivbrau

Hi everybody !
Still trying to use recent models with only 4 GB VRAM ...
This time, I tried SD3 medium.
I quantized it to Q4_0
And I get only black images (I have checked that this Q4 version works well on CPU backend and creates correct images)
And I precise also that small models like SD1.4 works well on the Vulkan backend.
With Q4_0, the amount of VRAM needed is lower than 2 GB, so it should be OK on my RTX A1000-4GB
I changed sampling method (euler / lcm), tried various nb iterations and various cfg_scale (7, 4.5)
I also tried Q5 quantization, and it is the same black result.
Does anybody have a clue ?

Here an example of log :

D:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\inference_tool_Vulkan_2024_11_30>sd -m "..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors" --vae-on-cpu --sampling-method lcm --steps 10 --cfg-scale 4.5 -H 512 -W 512 -s 42 -t 20 -p "a cute cat" -v
Option:
n_threads: 20
mode: txt2img
model_path: ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path:
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a cute cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 4.50
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: lcm
schedule: default
sample_steps: 10
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:168 - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA RTX A1000 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
ggml_vulkan: Compiling shaders..............................Done!
[INFO ] stable-diffusion.cpp:191 - loading model from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors'
[INFO ] model.cpp:885 - load ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors using gguf format
[DEBUG] model.cpp:902 - init from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors'
[INFO ] stable-diffusion.cpp:238 - Version: SD3.x
[INFO ] stable-diffusion.cpp:271 - Weight type: q5_0
[INFO ] stable-diffusion.cpp:272 - Conditioner weight type: q5_0
[INFO ] stable-diffusion.cpp:273 - Diffusion model weight type: q5_0
[INFO ] stable-diffusion.cpp:274 - VAE weight type: q5_0
[DEBUG] stable-diffusion.cpp:276 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:315 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:318 - CLIP: Using CPU backend
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[INFO ] mmdit.hpp:706 - MMDiT layers: 24 (including 0 MMDiT-x layers)
[DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 81.25 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 462.63 MB(RAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1075 - t5 params backend buffer size = 3123.05 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1075 - mmdit params backend buffer size = 1601.66 MB(VRAM) (491 tensors)
[INFO ] stable-diffusion.cpp:350 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1075 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:413 - loading weights
[DEBUG] model.cpp:1645 - loading tensors from ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors
[INFO ] model.cpp:1809 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | q5_0 | 2 [4096, 32128, 1, 1, 1]' in model file
[INFO ] stable-diffusion.cpp:512 - total params memory size = 5363.17MB (VRAM 1601.66MB, RAM 3761.51MB): clip 3666.93MB(RAM), unet 1601.66MB(VRAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:516 - loading model from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors' completed, taking 8.52s
[INFO ] stable-diffusion.cpp:530 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:590 - finished loaded file
[DEBUG] stable-diffusion.cpp:1464 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1194 - prompt after extract and remove lora: "a cute cat"
[INFO ] stable-diffusion.cpp:673 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1199 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:690 - parse 'a cute cat' to [['a cute cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 77
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 11.94 MB(RAM)
[DEBUG] conditioner.hpp:923 - computing condition graph completed, taking 3755 ms
[DEBUG] conditioner.hpp:690 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 77
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 11.94 MB(RAM)
[DEBUG] conditioner.hpp:923 - computing condition graph completed, taking 3728 ms
[INFO ] stable-diffusion.cpp:1332 - get_learned_condition completed, taking 7490 ms
[INFO ] stable-diffusion.cpp:1355 - sampling using LCM method
[INFO ] stable-diffusion.cpp:1359 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1026 - mmdit compute buffer size: 169.64 MB(VRAM)
|==================================================| 10/10 - 2.35s/it
[INFO ] stable-diffusion.cpp:1395 - sampling completed, taking 24.55s
[INFO ] stable-diffusion.cpp:1403 - generating 1 latent images completed, taking 24.93s
[INFO ] stable-diffusion.cpp:1406 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1026 - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:1045 - computing vae [mode: DECODE] graph completed, taking 15.97s
[INFO ] stable-diffusion.cpp:1416 - latent 1 decoded, taking 15.97s
[INFO ] stable-diffusion.cpp:1420 - decode_first_stage completed, taking 15.97s
[INFO ] stable-diffusion.cpp:1539 - txt2img completed in 48.40s
save result image to 'output.png'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions