-
Notifications
You must be signed in to change notification settings - Fork 42
feat: flash attention v3 #479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@cathalobrien great contribution. Please could you add some tiny docs or references in the configs to show how a user can control which attention is used? |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@mchantry |
TODO will merge both flash_attention wrappers into one wrapper to minimise config complexity |
Would be good to get output saying which version was chosen, for posterity. |
…e into feat/flash_attention_v3
done, good catch |
Description
Adds a flash attention v3 option to models/layers/attention.py
flash attention v3 is optimised for hoppers and newer GPUs, with up to 2x faster attention kernels. For more info see here
you can use flash-attn v3 with the following config entry
model.processor.attention_implementation=flash_attention_v3
I compared the loss (seed=42) for SDPA vs flash-attn v3 and it matched to the 7th decimal place

I also changed the error message reporting for the flash attention v2 wrapper, to actually print the error. This improves the user experience in the case that flash attn is installed but you need a newer gcc. Now anemoi will print the GLIBC error, rather then telling you to install flash attn when it's already installed
What problem does this change solve?
faster attention
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.