-
-
Notifications
You must be signed in to change notification settings - Fork 12k
fix: error due to FA2 when building #3266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@AlpinDale thank you for this! This almost worked out of the box for me but I got an error |
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
Thanks for the quick fix @mgoin |
|
I spoke too soon, it seems like the build succeeds but in actuality flash-attn just fails to install |
|
That's odd. I wonder if there's a way to specify a module should be installed without external dependencies in requirements.txt. That should be the only reason we need to do this for flash attention. |
|
Looks like there's no way to do this reliably. @WoosukKwon can we instead import the flash attention forward kernels directly in vLLM? I'm unsure why they're needed in the first place, I noticed 0 performance improvements with Flash Attention 2 in place of xFormers. |
|
Closing due to #3269 being a better solution. |
The #3005 PR introduced an issue where the Python env can't find
pipunder certain conditions. This PR usesensurepipto bootstrappipinto the existing environment.Resolves #3265