Black image with Vulkan + SD3 medium

Hi everybody !
Still trying to use recent models with only 4 GB VRAM ...
This time, I tried SD3 medium.
I quantized it to Q4_0
And I get only black images (I have checked that this Q4 version works well on CPU backend and creates correct images)
And I precise also that small models like SD1.4 works well on the Vulkan backend.
With Q4_0, the amount of VRAM needed is lower than 2 GB, so it should be OK on my RTX A1000-4GB
I changed sampling method (euler / lcm), tried various nb iterations and various cfg_scale (7, 4.5)
I also tried Q5 quantization, and it is the same black result.
Does anybody have a clue ?

Here an example of log :
> D:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\inference_tool_Vulkan_2024_11_30>sd -m "..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors" --vae-on-cpu --sampling-method lcm --steps 10 --cfg-scale 4.5 -H 512 -W 512 -s 42 -t 20  -p "a cute cat" -v
> Option:
>     n_threads:         20
>     mode:              txt2img
>     model_path:        ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors
>     wtype:             unspecified
>     clip_l_path:
>     clip_g_path:
>     t5xxl_path:
>     diffusion_model_path:
>     vae_path:
>     taesd_path:
>     esrgan_path:
>     controlnet_path:
>     embeddings_path:
>     stacked_id_embeddings_path:
>     input_id_images_path:
>     style ratio:       20.00
>     normalize input image :  false
>     output_path:       output.png
>     init_img:
>     control_image:
>     clip on cpu:       false
>     controlnet cpu:    false
>     vae decoder on cpu:true
>     diffusion flash attention:false
>     strength(control): 0.90
>     prompt:            a cute cat
>     negative_prompt:
>     min_cfg:           1.00
>     cfg_scale:         4.50
>     slg_scale:         0.00
>     guidance:          3.50
>     clip_skip:         -1
>     width:             512
>     height:            512
>     sample_method:     lcm
>     schedule:          default
>     sample_steps:      10
>     strength(img2img): 0.75
>     rng:               cuda
>     seed:              42
>     batch_count:       1
>     vae_tiling:        false
>     upscale_repeats:   1
> System Info:
>     SSE3 = 1
>     AVX = 1
>     AVX2 = 1
>     AVX512 = 0
>     AVX512_VBMI = 0
>     AVX512_VNNI = 0
>     FMA = 1
>     NEON = 0
>     ARM_FMA = 0
>     F16C = 1
>     FP16_VA = 0
>     WASM_SIMD = 0
>     VSX = 0
> [DEBUG] stable-diffusion.cpp:168  - Using Vulkan backend
> ggml_vulkan: Found 1 Vulkan devices:
> ggml_vulkan: 0 = NVIDIA RTX A1000 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
> ggml_vulkan: Compiling shaders..............................Done!
> [INFO ] stable-diffusion.cpp:191  - loading model from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors'
> [INFO ] model.cpp:885  - load ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors using gguf format
> [DEBUG] model.cpp:902  - init from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors'
> [INFO ] stable-diffusion.cpp:238  - Version: SD3.x
> [INFO ] stable-diffusion.cpp:271  - Weight type:                 q5_0
> [INFO ] stable-diffusion.cpp:272  - Conditioner weight type:     q5_0
> [INFO ] stable-diffusion.cpp:273  - Diffusion model weight type: q5_0
> [INFO ] stable-diffusion.cpp:274  - VAE weight type:             q5_0
> [DEBUG] stable-diffusion.cpp:276  - ggml tensor size = 400 bytes
> [INFO ] stable-diffusion.cpp:315  - set clip_on_cpu to true
> [INFO ] stable-diffusion.cpp:318  - CLIP: Using CPU backend
> [DEBUG] clip.hpp:171  - vocab size: 49408
> [DEBUG] clip.hpp:182  -  trigger word img already in vocab
> [DEBUG] clip.hpp:171  - vocab size: 49408
> [DEBUG] clip.hpp:182  -  trigger word img already in vocab
> [INFO ] mmdit.hpp:706  - MMDiT layers: 24 (including 0 MMDiT-x layers)
> [DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size =  81.25 MB(RAM) (196 tensors)
> [DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size =  462.63 MB(RAM) (517 tensors)
> [DEBUG] ggml_extend.hpp:1075 - t5 params backend buffer size =  3123.05 MB(RAM) (219 tensors)
> [DEBUG] ggml_extend.hpp:1075 - mmdit params backend buffer size =  1601.66 MB(VRAM) (491 tensors)
> [INFO ] stable-diffusion.cpp:350  - VAE Autoencoder: Using CPU backend
> [DEBUG] ggml_extend.hpp:1075 - vae params backend buffer size =  94.57 MB(RAM) (138 tensors)
> [DEBUG] stable-diffusion.cpp:413  - loading weights
> [DEBUG] model.cpp:1645 - loading tensors from ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors
> [INFO ] model.cpp:1809 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | q5_0 | 2 [4096, 32128, 1, 1, 1]' in model file
> [INFO ] stable-diffusion.cpp:512  - total params memory size = 5363.17MB (VRAM 1601.66MB, RAM 3761.51MB): clip 3666.93MB(RAM), unet 1601.66MB(VRAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
> [INFO ] stable-diffusion.cpp:516  - loading model from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors' completed, taking 8.52s
> [INFO ] stable-diffusion.cpp:530  - running in FLOW mode
> [DEBUG] stable-diffusion.cpp:590  - finished loaded file
> [DEBUG] stable-diffusion.cpp:1464 - txt2img 512x512
> [DEBUG] stable-diffusion.cpp:1194 - prompt after extract and remove lora: "a cute cat"
> [INFO ] stable-diffusion.cpp:673  - Attempting to apply 0 LoRAs
> [INFO ] stable-diffusion.cpp:1199 - apply_loras completed, taking 0.00s
> [DEBUG] conditioner.hpp:690  - parse 'a cute cat' to [['a cute cat', 1], ]
> [DEBUG] clip.hpp:311  - token length: 77
> [DEBUG] clip.hpp:311  - token length: 77
> [DEBUG] t5.hpp:397  - token length: 77
> [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
> [DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
> [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
> [DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
> [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
> [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
> [DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 11.94 MB(RAM)
> [DEBUG] conditioner.hpp:923  - computing condition graph completed, taking 3755 ms
> [DEBUG] conditioner.hpp:690  - parse '' to [['', 1], ]
> [DEBUG] clip.hpp:311  - token length: 77
> [DEBUG] clip.hpp:311  - token length: 77
> [DEBUG] t5.hpp:397  - token length: 77
> [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
> [DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
> [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
> [DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
> [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
> [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
> [DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 11.94 MB(RAM)
> [DEBUG] conditioner.hpp:923  - computing condition graph completed, taking 3728 ms
> [INFO ] stable-diffusion.cpp:1332 - get_learned_condition completed, taking 7490 ms
> [INFO ] stable-diffusion.cpp:1355 - sampling using LCM method
> [INFO ] stable-diffusion.cpp:1359 - generating image: 1/1 - seed 42
> [DEBUG] ggml_extend.hpp:1026 - mmdit compute buffer size: 169.64 MB(VRAM)
>   |==================================================| 10/10 - 2.35s/it
> [INFO ] stable-diffusion.cpp:1395 - sampling completed, taking 24.55s
> [INFO ] stable-diffusion.cpp:1403 - generating 1 latent images completed, taking 24.93s
> [INFO ] stable-diffusion.cpp:1406 - decoding 1 latents
> [DEBUG] ggml_extend.hpp:1026 - vae compute buffer size: 1664.00 MB(RAM)
> [DEBUG] stable-diffusion.cpp:1045 - computing vae [mode: DECODE] graph completed, taking 15.97s
> [INFO ] stable-diffusion.cpp:1416 - latent 1 decoded, taking 15.97s
> [INFO ] stable-diffusion.cpp:1420 - decode_first_stage completed, taking 15.97s
> [INFO ] stable-diffusion.cpp:1539 - txt2img completed in 48.40s
> save result image to 'output.png'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Black image with Vulkan + SD3 medium #560

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Black image with Vulkan + SD3 medium #560

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions