-
Notifications
You must be signed in to change notification settings - Fork 26.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding flash attention to GPT2 #27479
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good thanks for your hardwork!
I left few comments, please have a look, for the failing CI can you try to merge with upstream main?
Can you also add few lines in the docs (e.g. similarly as: #27400) for GPT2 ?
Thanks!
@younesbelkada, thanks for the tips ! Now most of the tests are passing. However, I'm facing a challenge with to address the issues in the following test:
For |
### Using Flash Attention 2 | ||
Flash Attention 2 is an advanced optimization method that dramatically reduces memory usage and increases inference speed. It's particularly effective for large-scale generation tasks. To utilize Flash Attention 2, ensure your hardware is compatible and install the necessary package with: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ```sh here. Not Python script.
Use the model with Flash Attention 2 as follows: | ||
|
||
```python | ||
from transformers import AutoModelForCausalLM, AutoTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The leading space should be removed otherwise it causes syntax error.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
Adding Flash Attention2 to GPT2, here are my tests:
Contributing to : #26350
Who can review?
Hey guys @younesbelkada @ArthurZucker, could you please review it when you get a chance.
I was trying to debug why I was getting these test failures, some of them point to falcon model ( even though I haven not touched that file ).
Also I ran the flash attention test on another model that has been merged, and these are the test results I am getting:
am I on the right path here? I couldn’t address why some of these test failures are happening.