Calling `ggml_view_3d()` on quantised tensors possibly causes alignment problems? #1140

jukofyork · 2025-03-11T21:23:16Z

jukofyork
Mar 11, 2025

I'm trying to get to the bottom of the problems with the deepseek2-mla code having horrible performance on quantised tensors, and have got it down to a couple of tensors that are stored as 2D but then viewed as 3D like this:

{512, 128*128}
ggml_view_3d() --> {512, 128, 128}

These same tenors have no problem if they are stored as F16, BF16 or F32, but Q8_0 or any other quant and they completely tank when they get used in a non-broadcasted batch-MM:

{512, 128, 128} x {512, n, 128} = {128, n, 128}

am I running into some memory alignment problem here and would storing the same tensors as 3D in the GGUF:

{512, 128, 128}

align the 2d dimension to a better boundary compared to the ggml_view_3d() call?

(I'm trying this now but it will take several hours to requant the model to use 3D for the problematic tensors)

If not, then is there any way these can be aligned/padded to help with this?

slaren · 2025-03-11T21:42:47Z

slaren
Mar 11, 2025
Maintainer

I wouldn't expect it to make a difference, but maybe there is some issue with the way batched matrix multiplications are scheduled to threads. It would be useful if you can reproduce this by adding a perf test case in test-backend-ops that shows the problem. If you are testing performance on a NUMA system you may also have to add code to initialize the CPU backend in NUMA mode, since test-backend-ops does not do it currently (see the implementation of llama_numa_init).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling `ggml_view_3d()` on quantised tensors possibly causes alignment problems? #1140

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Calling ggml_view_3d() on quantised tensors possibly causes alignment problems? #1140

jukofyork Mar 11, 2025

Replies: 1 comment

slaren Mar 11, 2025 Maintainer

Calling `ggml_view_3d()` on quantised tensors possibly causes alignment problems? #1140

jukofyork
Mar 11, 2025

slaren
Mar 11, 2025
Maintainer