CUDA: fix negative KV_max values in FA #15321

JohannesGaessler · 2025-08-14T15:47:45Z

Looking at the code I noticed that, depending on the inputs, there are scenarios where the values in KV_max could become negative. For this to happen there would need to be 16 consecutive tokens that are completely masked out, and the physical batch size would need to be >= 1024. I'm not sure this is going to fix #15294 or #15112 but it should be fixed either way since it could be causing problems in the future.

JohannesGaessler · 2025-08-14T16:21:17Z

~~I was able to reproduce the issue with the model getting stuck in #15294 and it seems to be fixed by this PR.~~

The conditions for triggering the issue seem to be inconsistent and I misinterpreted the results I got.

CUDA: fix negative KV_max values in FA

9846087

slaren approved these changes Aug 14, 2025

View reviewed changes

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 14, 2025

JohannesGaessler merged commit 4227c9b into ggml-org:master Aug 14, 2025
42 of 43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix negative KV_max values in FA #15321

CUDA: fix negative KV_max values in FA #15321

JohannesGaessler commented Aug 14, 2025

Uh oh!

JohannesGaessler commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

CUDA: fix negative KV_max values in FA #15321

CUDA: fix negative KV_max values in FA #15321

Conversation

JohannesGaessler commented Aug 14, 2025

Uh oh!

JohannesGaessler commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler commented Aug 14, 2025 •

edited

Loading