Closed
Description
🚀 The feature, motivation and pitch
Flash attention 3 seems very promising for running efficient LLM inference on NVIDIA Hopper cards. Are there any plans to support it in the future ?
https://github.com/Dao-AILab/flash-attention
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.