Fix FlashAttention debug test, FP32 assert #7684

JohannesGaessler · 2024-06-01T20:31:48Z

Removes an incorrect assert for FP32 Flashattention. Pads the number of elements for per KV cache row in the tests to a multiple of the block size. The backend is still going to return that the head size is not supported but I think that if something like this were to ever be implemented padding would be the only sensible way to do it.

Fix FlashAttention debug test, FP32 assert

4510236

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 1, 2024

slaren approved these changes Jun 1, 2024

View reviewed changes

JohannesGaessler merged commit e141ce6 into ggerganov:master Jun 1, 2024
67 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix FlashAttention debug test, FP32 assert #7684

Fix FlashAttention debug test, FP32 assert #7684

JohannesGaessler commented Jun 1, 2024

Fix FlashAttention debug test, FP32 assert #7684

Fix FlashAttention debug test, FP32 assert #7684

Conversation

JohannesGaessler commented Jun 1, 2024