[Bug] Index out of range(?) when running flux klein

### Git commit

e411520407663e1ddf8ff2e5ed4ff3a116fbbc97

### Operating System & Version

Arch Linux

### GGML backends

Vulkan

### Command-line arguments used

sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae ./vae/flux2-klein.Q4_K.gguf --llm /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf  -p "a lovely cat" --steps 4 --cfg-scale 1 --offload-to-cpu --vae-conv-direct --mmap -v

### Steps to reproduce

Running above command fails with runtime error. 
Freezes the whole system if i don't kill the process within a few seconds after the error.

### What you expected to happen

Successful run or Out of memory error

### What actually happened

Fails with what looks like Index out of range error:

`/usr/include/c++/15.2.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = char32_t; _Alloc = std::allocator<char32_t>; reference = char32_t&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.`

### Logs / error messages / stack trace

❯ sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae ./vae/flux2-klein.Q4_K.gguf --llm /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf  -p "a lovely cat" --steps 4 --cfg-scale 1 --offload-to-cpu --vae-conv-direct --mmap -v                                             
[DEBUG] main.cpp:500  - version: stable-diffusion.cpp version master-487-43e829f-1-ge411520+, commit e411520
[DEBUG] main.cpp:501  - System Info: 
    SSE3 = 0 |     AVX = 0 |     AVX2 = 0 |     AVX512 = 0 |     AVX512_VBMI = 0 |     AVX512_VNNI = 0 |     FMA = 0 |     NEON = 0 |     ARM_FMA = 0 |     F16C = 0 |     FP16_VA = 0 |     WASM_SIMD = 0 |     VSX = 0 | 
[DEBUG] main.cpp:502  - SDCliParams {
  mode: img_gen,
  output_path: "output.png",
  verbose: true,
  color: false,
  canny_preprocess: false,
  convert_name: false,
  preview_method: none,
  preview_interval: 1,
  preview_path: "preview.png",
  preview_fps: 16,
  taesd_preview: false,
  preview_noisy: false
}
[DEBUG] main.cpp:503  - SDContextParams {
  n_threads: 4,
  model_path: "",
  clip_l_path: "",
  clip_g_path: "",
  clip_vision_path: "",
  t5xxl_path: "",
  llm_path: "/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf",
  llm_vision_path: "",
  diffusion_model_path: "./models/flux-2-klein-4b-Q2_K.gguf",
  high_noise_diffusion_model_path: "",
  vae_path: "./vae/flux2-klein.Q4_K.gguf",
  taesd_path: "",
  esrgan_path: "",
  control_net_path: "",
  embedding_dir: "",
  embeddings: {
  }
  wtype: NONE,
  tensor_type_rules: "",
  lora_model_dir: ".",
  photo_maker_path: "",
  rng_type: cuda,
  sampler_rng_type: NONE,
  flow_shift: INF
  offload_params_to_cpu: true,
  enable_mmap: true,
  control_net_cpu: false,
  clip_on_cpu: false,
  vae_on_cpu: false,
  diffusion_flash_attn: false,
  diffusion_conv_direct: false,
  vae_conv_direct: true,
  circular: false,
  circular_x: false,
  circular_y: false,
  chroma_use_dit_mask: true,
  qwen_image_zero_cond_t: false,
  chroma_use_t5_mask: false,
  chroma_t5_mask_pad: 1,
  prediction: NONE,
  lora_apply_mode: auto,
  vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
  force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:504  - SDGenerationParams {
  loras: "{
  }",
  high_noise_loras: "{
  }",
  prompt: "a lovely cat",
  negative_prompt: "",
  clip_skip: -1,
  width: -1,
  height: -1,
  batch_count: 1,
  init_image_path: "",
  end_image_path: "",
  mask_image_path: "",
  control_image_path: "",
  ref_image_paths: [],
  control_video_path: "",
  auto_resize_ref_image: true,
  increase_ref_index: false,
  pm_id_images_dir: "",
  pm_id_embed_path: "",
  pm_style_strength: 20,
  skip_layers: [7, 8, 9],
  sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 4, eta: 0.00, shifted_timestep: 0),
  high_noise_skip_layers: [7, 8, 9],
  high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
  custom_sigmas: [],
  cache_mode: "",
  cache_option: "",
  cache: disabled (threshold=1, start=0.15, end=0.95),
  moe_boundary: 0.875,
  video_frames: 1,
  fps: 16,
  vace_strength: 1,
  strength: 0.75,
  control_strength: 0.9,
  seed: 42,
  upscale_repeats: 1,
  upscale_tile_size: 128,
}
[DEBUG] stable-diffusion.cpp:172  - Using Vulkan backend
[DEBUG] ggml_extend.hpp:75   - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:75   - ggml_vulkan: 0 = NVIDIA GeForce MX150 (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
[INFO ] stable-diffusion.cpp:193  - Vulkan: Using device 0
[INFO ] stable-diffusion.cpp:258  - loading diffusion model from './models/flux-2-klein-4b-Q2_K.gguf'
[INFO ] model.cpp:370  - load ./models/flux-2-klein-4b-Q2_K.gguf using gguf format
[DEBUG] model.cpp:416  - init from './models/flux-2-klein-4b-Q2_K.gguf'
[INFO ] stable-diffusion.cpp:305  - loading llm from '/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf'
[INFO ] model.cpp:370  - load /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf using gguf format
[DEBUG] model.cpp:416  - init from '/home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf'
[INFO ] stable-diffusion.cpp:319  - loading vae from './vae/flux2-klein.Q4_K.gguf'
[INFO ] model.cpp:370  - load ./vae/flux2-klein.Q4_K.gguf using gguf format
[DEBUG] model.cpp:416  - init from './vae/flux2-klein.Q4_K.gguf'
[INFO ] stable-diffusion.cpp:335  - Version: Flux.2 klein 
[INFO ] stable-diffusion.cpp:363  - Weight type stat:                      f32: 205  |    q2_K: 200  |    q3_K: 72   |    q4_K: 109  |    q6_K: 1    |    bf16: 208  
[INFO ] stable-diffusion.cpp:364  - Conditioner weight type stat:          f32: 145  |    q2_K: 144  |    q3_K: 72   |    q4_K: 36   |    q6_K: 1    
[INFO ] stable-diffusion.cpp:365  - Diffusion model weight type stat:      f32: 60   |    q2_K: 56   |    q4_K: 24   |    bf16: 9    
[INFO ] stable-diffusion.cpp:366  - VAE weight type stat:                 q4_K: 49   |    bf16: 199  
[DEBUG] stable-diffusion.cpp:368  - ggml tensor size = 400 bytes
[DEBUG] llm.hpp:285  - merges size 151387
[DEBUG] llm.hpp:317  - vocab size: 151669
[DEBUG] llm.hpp:1135 - llm: num_layers = 36, vocab_size = 151936, hidden_size = 2560, intermediate_size = 9728
[INFO ] flux.hpp:1346 - flux: depth = 5, depth_single_blocks = 20, guidance_embed = false, context_in_dim = 7680, hidden_size = 3072, num_heads = 24
[DEBUG] ggml_extend.hpp:1946 - qwen3 params backend buffer size =  2765.94 MB(RAM) (398 tensors)
[DEBUG] ggml_extend.hpp:1946 - flux params backend buffer size =  1743.12 MB(RAM) (149 tensors)
[INFO ] stable-diffusion.cpp:624  - Using Conv2d direct in the vae model
[DEBUG] ggml_extend.hpp:1946 - vae params backend buffer size =  93.28 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:752  - loading weights
[DEBUG] model.cpp:1381 - using 4 threads for model loading
[DEBUG] model.cpp:1403 - loading tensors from ./models/flux-2-klein-4b-Q2_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
  |=========>                                        | 149/795 - 6.98it/s
[DEBUG] model.cpp:1403 - loading tensors from /home/kvinayak/coding/llm/models/Qwen3-4B-Q2_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
  |==================================>               | 547/795 - 10.01it/s
[DEBUG] model.cpp:1403 - loading tensors from ./vae/flux2-klein.Q4_K.gguf
[DEBUG] model.cpp:1425 - using mmap for I/O
  |==================================================| 795/795 - 13.78it/s
[INFO ] model.cpp:1623 - loading tensors completed, taking 57.72s (process: 0.00s, read: 53.93s, memcpy: 0.00s, convert: 2.24s, copy_to_backend: 0.00s)
[DEBUG] stable-diffusion.cpp:787  - finished loaded file
[INFO ] stable-diffusion.cpp:845  - total params memory size = 4602.34MB (VRAM 4602.34MB, RAM 0.00MB): text_encoders 2765.94MB(VRAM), diffusion_model 1743.12MB(VRAM), vae 93.28MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:939  - running in Flux2 FLOW mode
[DEBUG] stable-diffusion.cpp:3472 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3506 - sampling using Euler method
[DEBUG] denoiser.hpp:703  - Flux2FlowDenoiser: set shift to 2.031
[INFO ] denoiser.hpp:403  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3633 - TXT2IMG
[DEBUG] conditioner.hpp:1679 - parse '<|im_start|>user
a lovely cat<|im_end|>
<|im_start|>assistant
<think>

</think>

' to [['<|im_start|>user
', 1], ['a lovely cat', 1], ['<|im_end|>
<|im_start|>assistant
<think>

</think>

', 1], ]
[DEBUG] llm.hpp:259  - split prompt "<|im_start|>user
" to tokens ["<|im_start|>", "user", "Ċ", ]
[DEBUG] llm.hpp:259  - split prompt "a lovely cat" to tokens ["a", "Ġlovely", "Ġcat", ]
/usr/include/c++/15.2.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = char32_t; _Alloc = std::allocator<char32_t>; reference = char32_t&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
zsh: killed     sd-cli --diffusion-model ./models/flux-2-klein-4b-Q2_K.gguf --vae  --llm  -p 

### Additional context / environment details

**Flux model:** flux-2-klein-4b-Q2_K.gguf from unsloth/FLUX.2-klein-4B-GGUF
**Qwen model:** Qwen3-4B-Q2_K.gguf from unsloth/Qwen3-4B-GGUF
**VAE:** original safetensors converted to gguf

Also tried with OlegSkutte/FLUX.2-klein-4B-GGUF and leejet/FLUX.2-klein-4B-GGUF. Same result.

I'm trying to run this with very low VRAM (2GB). So, I'd expect memory allocation errors. But this looks different.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Index out of range(?) when running flux klein #1241

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Index out of range(?) when running flux klein #1241

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions