vulkan : refactor buffer handling in vk_op_f32 #16840

Acly · 2025-10-29T11:36:02Z

This moves buffer/uma/offset/size computations into a function, which reduces amount of repetitive code in ggml_vk_op_f32 a lot. There are more places outside of vk_op_f32 where it can be reused, I didn't adapt those yet.

I replicated the existing logic, but don't really understand why the branching is needed for sub-buffer size computation. From what I can see and test it doesn't change the result, maybe this can be simplified? Or there is some corner case I can't think of? (see questions in comments)

Acly · 2025-10-29T11:40:29Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+    size_t size;
+    if (support_incontiguous) {
+        size = ggml_nbytes(tensor) + misalign_bytes;
+        if (offset + size >= buffer->size) {


This replicates the branch here https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-vulkan/ggml-vulkan.cpp#L8620-L8631

Why is this needed? The case offset+size > buffer->size sounds like it should never happen. In the case where they're equal, the branch computes the same value that's already there.

It might clamp the size to maxStorageBufferRange but that seems like it would likely produce wrong results and should rather assert/abort?

I think some of this logic is only here for historical reasons at this point. Early on, some shaders didn't have thorough bounds checking logic and relied on robustBufferAccess and these tight bounds to accomplish bounds checking. But robustBufferAccess isn't as reliable as it sounds, and we've improved the bounds checking logic over time.

To my knowledge, the only remaining shader that intentionally relies on robustBufferAccess bounds checking is the scalar/coopmat1 mul_mm shaders (not sure about mmq, but probably that too). These need to do the A/B loads in a way that allows the compiler to batch them, so manual bounds checking would probably interfere and cause a slowdown.

So, I think all of the current users of ggml_vk_op_f32 (and everything else except these mul_mm shaders) could just use ggml_vk_get_max_buffer_range.

Yes, I created a lot of that code for a very different version of the backend, and also when I knew a lot less about Vulkan than I do now. It's very possible there are checks and branches that are no longer needed.

Thanks for the feedback. I simplified the size to just ggml_nbytes(tensor) + misalign_bytes - this should be the actual bounds, and what the other branches ended up computing anyway.

Using ggml_vk_get_max_buffer_range also works, but feels less straight forward and it's still unclear to me if there's a reason to prefer it.

I now adapted all ops except for the mul_mat variants. They use more complex size computation. Also I want to avoid conflicts with #16868

ggml_vk_get_max_buffer_range is necessary for some operations when the size of the tensor is greater than 4GB (or greater than the maxBufferRange) and would lead to an invalid descriptor range being set. Please try enabling these ifdefed out tests:

#if 0 // >4GB im2col destination. Too slow to run by default. // Test cases taken from Wan2.1 T2V 1.3B. ...

#if 0 // > 4GB A matrix. Too slow to be enabled by default. ...

I thought that's what this line was for:

} else if (op == GGML_OP_IM2COL || op == GGML_OP_IM2COL_3D) { if (ctx->device->shader_int64 && ctx->device->buffer_device_address) { // buffer device address path doesn't use dst buffer d_sz = 1; }

(and mul_mat I didn't change so far)

IMO it would be even better to not have the binding in the shader at all (put it in a #else). Why is it there if it cannot be bound properly and isn't used? Instead of setting d_sz = 1 it could just do a dispatch right there without the dst buffer.

Good hint about having to enable those tests though, I hadn't done that. Unfortunately they just OOM on my 12GB GPU, I will find some way to check them.

The tests are passing with sufficient VRAM... sort of, I had to tone them down a bit since some cases ran out of system RAM (48GB) instead and I didn't want to switch to yet another machine. They still used >>4 GB buffers.

I looked back at my commit message for 2aaf0a2 when I added ggml_vk_get_max_buffer_range. The point was to stop using VK_WHOLE_SIZE since it could compute a size greater than maxStorageBufferRange. I think ggml_nbytes should be OK in the same circumstances, except for cases where ggml_nbytes can be greater than 4GB. Currently, that's just the im2col (which uses BDA) and the mul_mat (which you haven't changed) so I think this should be OK. ggml_vk_get_max_buffer_range is probably cheaper than ggml_nbytes, though.

Yea that's also what I concluded. The reason I prefer ggml_nbytes is that if it is >4GB for an OP that doesn't support it, ggml_vk_get_max_buffer_range would silently clamp and probably produce wrong results that are difficult to debug. Raising an error there is a good thing I think.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

jeffbolznv · 2025-11-02T16:59:10Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+    return vk_subbuffer{buffer, offset, size};
+}
+
+static vk_subbuffer ggml_vk_tensor_subbuffer_uma(


IMO putting "uma" in the name is confusing. I'd probably combine these functions and not have the "uma" name.

I made two functions initially because ggml_op_f32 does not do UMA handling for the dst buffer, and I wanted to preserve that. Apart from some CPU cycles it wouldn't hurt to have a single function though. I'd also be in favor of combining them if nobody has objections.

jeffbolznv · 2025-11-03T14:17:28Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

-    }
-
-    vk_buffer d_D = dst_buf_ctx->dev_buffer;
+    vk_subbuffer src0_buf = ggml_vk_tensor_subbuffer(ctx, src0, op_supports_incontiguous);


op_supports_incontiguous isn't quite the same as allow_misalign, which depends on the shader. But maybe this is similar to what the original code did, at least based on the size handling.

Yes op_supports_incontiguous are shaders which support (some but not always all) permutation and/or views with offsets. I don't think there is anything more fine-granular.

I think it can be removed in favour of checking those things in ggml_backend_vk_device_supports_op, and rejecting ops with misaligned tensors-views there if the shader doesn't support it. (But it's for another PR perhaps)

0cc4m

Thank you, this is a nice change.

Acly commented Oct 29, 2025

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated Show resolved Hide resolved

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Oct 29, 2025

vulkan : refactor/simplify buffer handling in vk_op_* functions

edacb52

Acly force-pushed the vulkan-op-subbuffer branch from 09475f8 to edacb52 Compare October 31, 2025 14:23

Acly marked this pull request as ready for review October 31, 2025 14:58

jeffbolznv reviewed Nov 2, 2025

View reviewed changes

Combine UMA handling into ggml_vk_tensor_subbuffer

cbdf516

jeffbolznv approved these changes Nov 3, 2025

View reviewed changes

0cc4m approved these changes Nov 7, 2025

View reviewed changes

0cc4m merged commit ac76d36 into ggml-org:master Nov 7, 2025
69 of 71 checks passed

vulkan : refactor buffer handling in vk_op_f32 #16840

vulkan : refactor buffer handling in vk_op_f32 #16840

Uh oh!

Conversation

Acly commented Oct 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Acly Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Acly Nov 2, 2025 •

edited

Loading