Small performance improvement to pallas MHA #22923

Rifur13 · 2024-08-07T23:28:17Z

Small performance increase (and simpler code) of FlashAttention by reducing non-matmul FLOPs and data movement.

There are 2 changes:

See section 3.1.1 of the FlashAttention-2 paper for more details https://arxiv.org/pdf/2307.08691.

sharadmv

Awesome!

Small performance improvement to pallas MHA

e6425a2

Rifur13 requested a review from sharadmv August 7, 2024 23:28

sharadmv approved these changes Aug 8, 2024

View reviewed changes

google-ml-butler bot added kokoro:force-run pull ready Ready for copybara import and testing labels Aug 8, 2024

kokoro-team removed the kokoro:force-run label Aug 8, 2024

copybara-service bot merged commit 9fbc51b into jax-ml:main Aug 8, 2024
16 checks passed

Rifur13 deleted the faster branch August 28, 2024 04:14

Provide feedback