Skip to content

[Bug] Index out of range(?) when running flux klein #1241

@K-Vinayak

Description

@K-Vinayak

Git commit

e411520

Operating System & Version

Arch Linux

GGML backends

Vulkan

Command-line arguments used

sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae ./vae/flux2-klein.Q4_K.gguf --llm /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf -p "a lovely cat" --steps 4 --cfg-scale 1 --offload-to-cpu --vae-conv-direct --mmap -v

Steps to reproduce

Running above command fails with runtime error.
Freezes the whole system if i don't kill the process within a few seconds after the error.

What you expected to happen

Successful run or Out of memory error

What actually happened

Fails with what looks like Index out of range error:

/usr/include/c++/15.2.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = char32_t; _Alloc = std::allocator<char32_t>; reference = char32_t&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

Logs / error messages / stack trace

❯ sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae ./vae/flux2-klein.Q4_K.gguf --llm /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf -p "a lovely cat" --steps 4 --cfg-scale 1 --offload-to-cpu --vae-conv-direct --mmap -v
[DEBUG] main.cpp:500 - version: stable-diffusion.cpp version master-487-43e829f-1-ge411520+, commit e411520
[DEBUG] main.cpp:501 - System Info:
SSE3 = 0 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 |
[DEBUG] main.cpp:502 - SDCliParams {
mode: img_gen,
output_path: "output.png",
verbose: true,
color: false,
canny_preprocess: false,
convert_name: false,
preview_method: none,
preview_interval: 1,
preview_path: "preview.png",
preview_fps: 16,
taesd_preview: false,
preview_noisy: false
}
[DEBUG] main.cpp:503 - SDContextParams {
n_threads: 4,
model_path: "",
clip_l_path: "",
clip_g_path: "",
clip_vision_path: "",
t5xxl_path: "",
llm_path: "/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf",
llm_vision_path: "",
diffusion_model_path: "./models/flux-2-klein-4b-Q2_K.gguf",
high_noise_diffusion_model_path: "",
vae_path: "./vae/flux2-klein.Q4_K.gguf",
taesd_path: "",
esrgan_path: "",
control_net_path: "",
embedding_dir: "",
embeddings: {
}
wtype: NONE,
tensor_type_rules: "",
lora_model_dir: ".",
photo_maker_path: "",
rng_type: cuda,
sampler_rng_type: NONE,
flow_shift: INF
offload_params_to_cpu: true,
enable_mmap: true,
control_net_cpu: false,
clip_on_cpu: false,
vae_on_cpu: false,
diffusion_flash_attn: false,
diffusion_conv_direct: false,
vae_conv_direct: true,
circular: false,
circular_x: false,
circular_y: false,
chroma_use_dit_mask: true,
qwen_image_zero_cond_t: false,
chroma_use_t5_mask: false,
chroma_t5_mask_pad: 1,
prediction: NONE,
lora_apply_mode: auto,
vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:504 - SDGenerationParams {
loras: "{
}",
high_noise_loras: "{
}",
prompt: "a lovely cat",
negative_prompt: "",
clip_skip: -1,
width: -1,
height: -1,
batch_count: 1,
init_image_path: "",
end_image_path: "",
mask_image_path: "",
control_image_path: "",
ref_image_paths: [],
control_video_path: "",
auto_resize_ref_image: true,
increase_ref_index: false,
pm_id_images_dir: "",
pm_id_embed_path: "",
pm_style_strength: 20,
skip_layers: [7, 8, 9],
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 4, eta: 0.00, shifted_timestep: 0),
high_noise_skip_layers: [7, 8, 9],
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
custom_sigmas: [],
cache_mode: "",
cache_option: "",
cache: disabled (threshold=1, start=0.15, end=0.95),
moe_boundary: 0.875,
video_frames: 1,
fps: 16,
vace_strength: 1,
strength: 0.75,
control_strength: 0.9,
seed: 42,
upscale_repeats: 1,
upscale_tile_size: 128,
}
[DEBUG] stable-diffusion.cpp:172 - Using Vulkan backend
[DEBUG] ggml_extend.hpp:75 - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:75 - ggml_vulkan: 0 = NVIDIA GeForce MX150 (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
[INFO ] stable-diffusion.cpp:193 - Vulkan: Using device 0
[INFO ] stable-diffusion.cpp:258 - loading diffusion model from './models/flux-2-klein-4b-Q2_K.gguf'
[INFO ] model.cpp:370 - load ./models/flux-2-klein-4b-Q2_K.gguf using gguf format
[DEBUG] model.cpp:416 - init from './models/flux-2-klein-4b-Q2_K.gguf'
[INFO ] stable-diffusion.cpp:305 - loading llm from '/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf'
[INFO ] model.cpp:370 - load /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf using gguf format
[DEBUG] model.cpp:416 - init from '/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf'
[INFO ] stable-diffusion.cpp:319 - loading vae from './vae/flux2-klein.Q4_K.gguf'
[INFO ] model.cpp:370 - load ./vae/flux2-klein.Q4_K.gguf using gguf format
[DEBUG] model.cpp:416 - init from './vae/flux2-klein.Q4_K.gguf'
[INFO ] stable-diffusion.cpp:335 - Version: Flux.2 klein
[INFO ] stable-diffusion.cpp:363 - Weight type stat: f32: 205 | q2_K: 200 | q3_K: 72 | q4_K: 109 | q6_K: 1 | bf16: 208
[INFO ] stable-diffusion.cpp:364 - Conditioner weight type stat: f32: 145 | q2_K: 144 | q3_K: 72 | q4_K: 36 | q6_K: 1
[INFO ] stable-diffusion.cpp:365 - Diffusion model weight type stat: f32: 60 | q2_K: 56 | q4_K: 24 | bf16: 9
[INFO ] stable-diffusion.cpp:366 - VAE weight type stat: q4_K: 49 | bf16: 199
[DEBUG] stable-diffusion.cpp:368 - ggml tensor size = 400 bytes
[DEBUG] llm.hpp:285 - merges size 151387
[DEBUG] llm.hpp:317 - vocab size: 151669
[DEBUG] llm.hpp:1135 - llm: num_layers = 36, vocab_size = 151936, hidden_size = 2560, intermediate_size = 9728
[INFO ] flux.hpp:1346 - flux: depth = 5, depth_single_blocks = 20, guidance_embed = false, context_in_dim = 7680, hidden_size = 3072, num_heads = 24
[DEBUG] ggml_extend.hpp:1946 - qwen3 params backend buffer size = 2765.94 MB(RAM) (398 tensors)
[DEBUG] ggml_extend.hpp:1946 - flux params backend buffer size = 1743.12 MB(RAM) (149 tensors)
[INFO ] stable-diffusion.cpp:624 - Using Conv2d direct in the vae model
[DEBUG] ggml_extend.hpp:1946 - vae params backend buffer size = 93.28 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:752 - loading weights
[DEBUG] model.cpp:1381 - using 4 threads for model loading
[DEBUG] model.cpp:1403 - loading tensors from ./models/flux-2-klein-4b-Q2_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
|=========> | 149/795 - 6.98it/s
[DEBUG] model.cpp:1403 - loading tensors from /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
|==================================> | 547/795 - 10.01it/s
[DEBUG] model.cpp:1403 - loading tensors from ./vae/flux2-klein.Q4_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
|==================================================| 795/795 - 13.78it/s
[INFO ] model.cpp:1623 - loading tensors completed, taking 57.72s (process: 0.00s, read: 53.93s, memcpy: 0.00s, convert: 2.24s, copy_to_backend: 0.00s)
[DEBUG] stable-diffusion.cpp:787 - finished loaded file
[INFO ] stable-diffusion.cpp:845 - total params memory size = 4602.34MB (VRAM 4602.34MB, RAM 0.00MB): text_encoders 2765.94MB(VRAM), diffusion_model 1743.12MB(VRAM), vae 93.28MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:939 - running in Flux2 FLOW mode
[DEBUG] stable-diffusion.cpp:3472 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3506 - sampling using Euler method
[DEBUG] denoiser.hpp:703 - Flux2FlowDenoiser: set shift to 2.031
[INFO ] denoiser.hpp:403 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3633 - TXT2IMG
[DEBUG] conditioner.hpp:1679 - parse '<|im_start|>user
a lovely cat<|im_end|>
<|im_start|>assistant

' to [['<|im_start|>user
', 1], ['a lovely cat', 1], ['<|im_end|>
<|im_start|>assistant

', 1], ]
[DEBUG] llm.hpp:259 - split prompt "<|im_start|>user
" to tokens ["<|im_start|>", "user", "Ċ", ]
[DEBUG] llm.hpp:259 - split prompt "a lovely cat" to tokens ["a", "Ġlovely", "Ġcat", ]
/usr/include/c++/15.2.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator [with _Tp = char32_t; _Alloc = std::allocator<char32_t>; reference = char32_t&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
zsh: killed sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae --llm -p

Additional context / environment details

Flux model: flux-2-klein-4b-Q2_K.gguf from unsloth/FLUX.2-klein-4B-GGUF
Qwen model: Qwen3-4B-Q2_K.gguf from unsloth/Qwen3-4B-GGUF
VAE: original safetensors converted to gguf

Also tried with OlegSkutte/FLUX.2-klein-4B-GGUF and leejet/FLUX.2-klein-4B-GGUF. Same result.

I'm trying to run this with very low VRAM (2GB). So, I'd expect memory allocation errors. But this looks different.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions