Possible bug with QuantMatMul

**Describe the Issue**
I'm testing various settings using the Run Benchmark option and ticked the Use QuantMatMul option. It started out using roughly the same amount of memory as the same settings but QuantMatMul disabled but over time, it just kept dumping more and more data into shared VRAM. GPU VRAM maxed out at about 6.51GB but dropped to 6.3GB when it finished the processing stage. Shared VRAM started at around 50MB but ballooned up to 9.16GB by the time it finished the processing stage. (It also ended up with 18.66GB of RAM used.) For reference, the same settings sans QuantMatMul resulted in 7.34GB of RAM, 4.05GB of GPU VRAM, and 0.04GB of shared VRAM by the end of the benchmark.

From my understanding of QuantMatMul, it's supposed to save memory rather than inflate memory more and more over time.

**Additional Information:**
64-bit Windows 10, Intel 10600K CPU (running at stock speeds), 8GB AMD RX 6650XT GPU, 128GB DDR4 RAM, using the hipBLAS driver, no pagefile.

Model: [Bartowski's IQ3_XXS build of Qwen3 235B a22B](https://huggingface.co/bartowski/Qwen_Qwen3-235B-A22B-Instruct-2507-GGUF?show_file_info=Qwen_Qwen3-235B-A22B-Instruct-2507-IQ3_XXS%2FQwen_Qwen3-235B-A22B-Instruct-2507-IQ3_XXS-00001-of-00003.gguf)

KoboldCPP settings:
- 5 GPU Layers
- 16384 context
- MMAP enabled
- 8 CPU threads and BLAS threads
- 512 BLAS batch size
- FastForwarding enabled
- 6 Experts
- 2 CPU expert layers
- Tensor override: `(blk\.\d+\.(ffn_down|ffn_gate_exps|ffn_up_exps)\.weight)|(output\.weight)=CPU`

Edit: I'm noticing that the memory explosion doesn't happen if I enable both QuantMatMul *and* Flash Attention. Only when I use QuantMatMul alone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug with QuantMatMul #144

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible bug with QuantMatMul #144

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions