-
Notifications
You must be signed in to change notification settings - Fork 525
Description
Git commit
Operating System & Version
Arch Linux
GGML backends
Vulkan
Command-line arguments used
sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae ./vae/flux2-klein.Q4_K.gguf --llm /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf -p "a lovely cat" --steps 4 --cfg-scale 1 --offload-to-cpu --vae-conv-direct --mmap -v
Steps to reproduce
Running above command fails with runtime error.
Freezes the whole system if i don't kill the process within a few seconds after the error.
What you expected to happen
Successful run or Out of memory error
What actually happened
Fails with what looks like Index out of range error:
/usr/include/c++/15.2.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = char32_t; _Alloc = std::allocator<char32_t>; reference = char32_t&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
Logs / error messages / stack trace
❯ sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae ./vae/flux2-klein.Q4_K.gguf --llm /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf -p "a lovely cat" --steps 4 --cfg-scale 1 --offload-to-cpu --vae-conv-direct --mmap -v
[DEBUG] main.cpp:500 - version: stable-diffusion.cpp version master-487-43e829f-1-ge411520+, commit e411520
[DEBUG] main.cpp:501 - System Info:
SSE3 = 0 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 |
[DEBUG] main.cpp:502 - SDCliParams {
mode: img_gen,
output_path: "output.png",
verbose: true,
color: false,
canny_preprocess: false,
convert_name: false,
preview_method: none,
preview_interval: 1,
preview_path: "preview.png",
preview_fps: 16,
taesd_preview: false,
preview_noisy: false
}
[DEBUG] main.cpp:503 - SDContextParams {
n_threads: 4,
model_path: "",
clip_l_path: "",
clip_g_path: "",
clip_vision_path: "",
t5xxl_path: "",
llm_path: "/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf",
llm_vision_path: "",
diffusion_model_path: "./models/flux-2-klein-4b-Q2_K.gguf",
high_noise_diffusion_model_path: "",
vae_path: "./vae/flux2-klein.Q4_K.gguf",
taesd_path: "",
esrgan_path: "",
control_net_path: "",
embedding_dir: "",
embeddings: {
}
wtype: NONE,
tensor_type_rules: "",
lora_model_dir: ".",
photo_maker_path: "",
rng_type: cuda,
sampler_rng_type: NONE,
flow_shift: INF
offload_params_to_cpu: true,
enable_mmap: true,
control_net_cpu: false,
clip_on_cpu: false,
vae_on_cpu: false,
diffusion_flash_attn: false,
diffusion_conv_direct: false,
vae_conv_direct: true,
circular: false,
circular_x: false,
circular_y: false,
chroma_use_dit_mask: true,
qwen_image_zero_cond_t: false,
chroma_use_t5_mask: false,
chroma_t5_mask_pad: 1,
prediction: NONE,
lora_apply_mode: auto,
vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:504 - SDGenerationParams {
loras: "{
}",
high_noise_loras: "{
}",
prompt: "a lovely cat",
negative_prompt: "",
clip_skip: -1,
width: -1,
height: -1,
batch_count: 1,
init_image_path: "",
end_image_path: "",
mask_image_path: "",
control_image_path: "",
ref_image_paths: [],
control_video_path: "",
auto_resize_ref_image: true,
increase_ref_index: false,
pm_id_images_dir: "",
pm_id_embed_path: "",
pm_style_strength: 20,
skip_layers: [7, 8, 9],
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 4, eta: 0.00, shifted_timestep: 0),
high_noise_skip_layers: [7, 8, 9],
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
custom_sigmas: [],
cache_mode: "",
cache_option: "",
cache: disabled (threshold=1, start=0.15, end=0.95),
moe_boundary: 0.875,
video_frames: 1,
fps: 16,
vace_strength: 1,
strength: 0.75,
control_strength: 0.9,
seed: 42,
upscale_repeats: 1,
upscale_tile_size: 128,
}
[DEBUG] stable-diffusion.cpp:172 - Using Vulkan backend
[DEBUG] ggml_extend.hpp:75 - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:75 - ggml_vulkan: 0 = NVIDIA GeForce MX150 (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
[INFO ] stable-diffusion.cpp:193 - Vulkan: Using device 0
[INFO ] stable-diffusion.cpp:258 - loading diffusion model from './models/flux-2-klein-4b-Q2_K.gguf'
[INFO ] model.cpp:370 - load ./models/flux-2-klein-4b-Q2_K.gguf using gguf format
[DEBUG] model.cpp:416 - init from './models/flux-2-klein-4b-Q2_K.gguf'
[INFO ] stable-diffusion.cpp:305 - loading llm from '/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf'
[INFO ] model.cpp:370 - load /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf using gguf format
[DEBUG] model.cpp:416 - init from '/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf'
[INFO ] stable-diffusion.cpp:319 - loading vae from './vae/flux2-klein.Q4_K.gguf'
[INFO ] model.cpp:370 - load ./vae/flux2-klein.Q4_K.gguf using gguf format
[DEBUG] model.cpp:416 - init from './vae/flux2-klein.Q4_K.gguf'
[INFO ] stable-diffusion.cpp:335 - Version: Flux.2 klein
[INFO ] stable-diffusion.cpp:363 - Weight type stat: f32: 205 | q2_K: 200 | q3_K: 72 | q4_K: 109 | q6_K: 1 | bf16: 208
[INFO ] stable-diffusion.cpp:364 - Conditioner weight type stat: f32: 145 | q2_K: 144 | q3_K: 72 | q4_K: 36 | q6_K: 1
[INFO ] stable-diffusion.cpp:365 - Diffusion model weight type stat: f32: 60 | q2_K: 56 | q4_K: 24 | bf16: 9
[INFO ] stable-diffusion.cpp:366 - VAE weight type stat: q4_K: 49 | bf16: 199
[DEBUG] stable-diffusion.cpp:368 - ggml tensor size = 400 bytes
[DEBUG] llm.hpp:285 - merges size 151387
[DEBUG] llm.hpp:317 - vocab size: 151669
[DEBUG] llm.hpp:1135 - llm: num_layers = 36, vocab_size = 151936, hidden_size = 2560, intermediate_size = 9728
[INFO ] flux.hpp:1346 - flux: depth = 5, depth_single_blocks = 20, guidance_embed = false, context_in_dim = 7680, hidden_size = 3072, num_heads = 24
[DEBUG] ggml_extend.hpp:1946 - qwen3 params backend buffer size = 2765.94 MB(RAM) (398 tensors)
[DEBUG] ggml_extend.hpp:1946 - flux params backend buffer size = 1743.12 MB(RAM) (149 tensors)
[INFO ] stable-diffusion.cpp:624 - Using Conv2d direct in the vae model
[DEBUG] ggml_extend.hpp:1946 - vae params backend buffer size = 93.28 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:752 - loading weights
[DEBUG] model.cpp:1381 - using 4 threads for model loading
[DEBUG] model.cpp:1403 - loading tensors from ./models/flux-2-klein-4b-Q2_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
|=========> | 149/795 - 6.98it/s
[DEBUG] model.cpp:1403 - loading tensors from /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
|==================================> | 547/795 - 10.01it/s
[DEBUG] model.cpp:1403 - loading tensors from ./vae/flux2-klein.Q4_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
|==================================================| 795/795 - 13.78it/s
[INFO ] model.cpp:1623 - loading tensors completed, taking 57.72s (process: 0.00s, read: 53.93s, memcpy: 0.00s, convert: 2.24s, copy_to_backend: 0.00s)
[DEBUG] stable-diffusion.cpp:787 - finished loaded file
[INFO ] stable-diffusion.cpp:845 - total params memory size = 4602.34MB (VRAM 4602.34MB, RAM 0.00MB): text_encoders 2765.94MB(VRAM), diffusion_model 1743.12MB(VRAM), vae 93.28MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:939 - running in Flux2 FLOW mode
[DEBUG] stable-diffusion.cpp:3472 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3506 - sampling using Euler method
[DEBUG] denoiser.hpp:703 - Flux2FlowDenoiser: set shift to 2.031
[INFO ] denoiser.hpp:403 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3633 - TXT2IMG
[DEBUG] conditioner.hpp:1679 - parse '<|im_start|>user
a lovely cat<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>user
', 1], ['a lovely cat', 1], ['<|im_end|>
<|im_start|>assistant
', 1], ]
[DEBUG] llm.hpp:259 - split prompt "<|im_start|>user
" to tokens ["<|im_start|>", "user", "Ċ", ]
[DEBUG] llm.hpp:259 - split prompt "a lovely cat" to tokens ["a", "Ġlovely", "Ġcat", ]
/usr/include/c++/15.2.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator [with _Tp = char32_t; _Alloc = std::allocator<char32_t>; reference = char32_t&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
zsh: killed sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae --llm -p
Additional context / environment details
Flux model: flux-2-klein-4b-Q2_K.gguf from unsloth/FLUX.2-klein-4B-GGUF
Qwen model: Qwen3-4B-Q2_K.gguf from unsloth/Qwen3-4B-GGUF
VAE: original safetensors converted to gguf
Also tried with OlegSkutte/FLUX.2-klein-4B-GGUF and leejet/FLUX.2-klein-4B-GGUF. Same result.
I'm trying to run this with very low VRAM (2GB). So, I'd expect memory allocation errors. But this looks different.