Skip to content

feat: flash attention v3 #479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

feat: flash attention v3 #479

wants to merge 16 commits into from

Conversation

cathalobrien
Copy link
Contributor

@cathalobrien cathalobrien commented Aug 13, 2025

Description

Adds a flash attention v3 option to models/layers/attention.py

flash attention v3 is optimised for hoppers and newer GPUs, with up to 2x faster attention kernels. For more info see here

you can use flash-attn v3 with the following config entry

model.processor.attention_implementation=flash_attention_v3

I compared the loss (seed=42) for SDPA vs flash-attn v3 and it matched to the 7th decimal place
Screenshot 2025-08-13 at 16 51 34

I also changed the error message reporting for the flash attention v2 wrapper, to actually print the error. This improves the user experience in the case that flash attn is installed but you need a newer gcc. Now anemoi will print the GLIBC error, rather then telling you to install flash attn when it's already installed

What problem does this change solve?

faster attention

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@cathalobrien cathalobrien changed the title flash attention v3 feat: flash attention v3 Aug 13, 2025
@mchantry mchantry added the ATS Approval Not Needed No approval needed by ATS label Aug 13, 2025
@mchantry
Copy link
Member

@cathalobrien great contribution. Please could you add some tiny docs or references in the configs to show how a user can control which attention is used?

@cathalobrien
Copy link
Contributor Author

@mchantry
added an explanation of the different attention implementations, and how to set it into the docs
Screenshot 2025-08-18 at 10 29 09

@cathalobrien
Copy link
Contributor Author

TODO will merge both flash_attention wrappers into one wrapper to minimise config complexity

@mchantry
Copy link
Member

TODO will merge both flash_attention wrappers into one wrapper to minimise config complexity

Would be good to get output saying which version was chosen, for posterity.

@cathalobrien
Copy link
Contributor Author

TODO will merge both flash_attention wrappers into one wrapper to minimise config complexity

Would be good to get output saying which version was chosen, for posterity.

done, good catch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: To be triaged
Development

Successfully merging this pull request may close these issues.

2 participants