-
Notifications
You must be signed in to change notification settings - Fork 439
Description
Hi everybody !
Still trying to use recent models with only 4 GB VRAM ...
This time, I tried SD3 medium.
I quantized it to Q4_0
And I get only black images (I have checked that this Q4 version works well on CPU backend and creates correct images)
And I precise also that small models like SD1.4 works well on the Vulkan backend.
With Q4_0, the amount of VRAM needed is lower than 2 GB, so it should be OK on my RTX A1000-4GB
I changed sampling method (euler / lcm), tried various nb iterations and various cfg_scale (7, 4.5)
I also tried Q5 quantization, and it is the same black result.
Does anybody have a clue ?
Here an example of log :
D:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\inference_tool_Vulkan_2024_11_30>sd -m "..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors" --vae-on-cpu --sampling-method lcm --steps 10 --cfg-scale 4.5 -H 512 -W 512 -s 42 -t 20 -p "a cute cat" -v
Option:
n_threads: 20
mode: txt2img
model_path: ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path:
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a cute cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 4.50
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: lcm
schedule: default
sample_steps: 10
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:168 - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA RTX A1000 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
ggml_vulkan: Compiling shaders..............................Done!
[INFO ] stable-diffusion.cpp:191 - loading model from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors'
[INFO ] model.cpp:885 - load ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors using gguf format
[DEBUG] model.cpp:902 - init from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors'
[INFO ] stable-diffusion.cpp:238 - Version: SD3.x
[INFO ] stable-diffusion.cpp:271 - Weight type: q5_0
[INFO ] stable-diffusion.cpp:272 - Conditioner weight type: q5_0
[INFO ] stable-diffusion.cpp:273 - Diffusion model weight type: q5_0
[INFO ] stable-diffusion.cpp:274 - VAE weight type: q5_0
[DEBUG] stable-diffusion.cpp:276 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:315 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:318 - CLIP: Using CPU backend
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[INFO ] mmdit.hpp:706 - MMDiT layers: 24 (including 0 MMDiT-x layers)
[DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 81.25 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 462.63 MB(RAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1075 - t5 params backend buffer size = 3123.05 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1075 - mmdit params backend buffer size = 1601.66 MB(VRAM) (491 tensors)
[INFO ] stable-diffusion.cpp:350 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1075 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:413 - loading weights
[DEBUG] model.cpp:1645 - loading tensors from ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors
[INFO ] model.cpp:1809 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | q5_0 | 2 [4096, 32128, 1, 1, 1]' in model file
[INFO ] stable-diffusion.cpp:512 - total params memory size = 5363.17MB (VRAM 1601.66MB, RAM 3761.51MB): clip 3666.93MB(RAM), unet 1601.66MB(VRAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:516 - loading model from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors' completed, taking 8.52s
[INFO ] stable-diffusion.cpp:530 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:590 - finished loaded file
[DEBUG] stable-diffusion.cpp:1464 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1194 - prompt after extract and remove lora: "a cute cat"
[INFO ] stable-diffusion.cpp:673 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1199 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:690 - parse 'a cute cat' to [['a cute cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 77
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 11.94 MB(RAM)
[DEBUG] conditioner.hpp:923 - computing condition graph completed, taking 3755 ms
[DEBUG] conditioner.hpp:690 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 77
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 11.94 MB(RAM)
[DEBUG] conditioner.hpp:923 - computing condition graph completed, taking 3728 ms
[INFO ] stable-diffusion.cpp:1332 - get_learned_condition completed, taking 7490 ms
[INFO ] stable-diffusion.cpp:1355 - sampling using LCM method
[INFO ] stable-diffusion.cpp:1359 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1026 - mmdit compute buffer size: 169.64 MB(VRAM)
|==================================================| 10/10 - 2.35s/it
[INFO ] stable-diffusion.cpp:1395 - sampling completed, taking 24.55s
[INFO ] stable-diffusion.cpp:1403 - generating 1 latent images completed, taking 24.93s
[INFO ] stable-diffusion.cpp:1406 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1026 - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:1045 - computing vae [mode: DECODE] graph completed, taking 15.97s
[INFO ] stable-diffusion.cpp:1416 - latent 1 decoded, taking 15.97s
[INFO ] stable-diffusion.cpp:1420 - decode_first_stage completed, taking 15.97s
[INFO ] stable-diffusion.cpp:1539 - txt2img completed in 48.40s
save result image to 'output.png'