-
Notifications
You must be signed in to change notification settings - Fork 26.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding flash attention to GPT2 #27479
Closed
Closed
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
6b893d6
Added flash attention to GPT2
canberk17 ecb0b2a
formatted with black
canberk17 8b5e7d0
fix copies
canberk17 4bb31e3
decision transformer gpt2 flash attention added
canberk17 080abd5
removed extra mlp class
canberk17 1ca6ddb
styling
canberk17 5e8d931
fix
canberk17 87c5ccb
fixed tensor size
canberk17 8668c40
Merge remote-tracking branch 'upstream/main' into gpt2-flash-attn
canberk17 bb52c48
removed padding_mask and added reference
canberk17 f2a7ce5
setup and quality test
canberk17 357b0de
formatting and black[jupyter]
canberk17 c8b9386
mask shape
canberk17 8ae7990
cross attention shape fix
canberk17 b0e3d05
dimension changed
canberk17 71b7d23
dimension changed
canberk17 e8a2068
attention mask
canberk17 5d13594
one more
canberk17 6d65329
decision transformer fix
canberk17 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -81,6 +81,22 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h | |
- [Token classification task guide](../tasks/token_classification) | ||
- [Causal language modeling task guide](../tasks/language_modeling) | ||
|
||
### Using Flash Attention 2 | ||
Flash Attention 2 is an advanced optimization method that dramatically reduces memory usage and increases inference speed. It's particularly effective for large-scale generation tasks. To utilize Flash Attention 2, ensure your hardware is compatible and install the necessary package with: | ||
|
||
```python | ||
pip install -U flash-attn --no-build-isolation | ||
``` | ||
|
||
Use the model with Flash Attention 2 as follows: | ||
|
||
```python | ||
from transformers import AutoModelForCausalLM, AutoTokenizer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The leading space should be removed otherwise it causes syntax error. |
||
device = "cuda" if torch.cuda.is_available() else "cpu" | ||
model = AutoModelForCausalLM.from_pretrained("gpt2", torch_dtype=torch.float16, use_flash_attention_2=True).to(device) | ||
tokenizer = AutoTokenizer.from_pretrained("gpt2") | ||
``` | ||
|
||
## GPT2Config | ||
|
||
[[autodoc]] GPT2Config | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ```sh here. Not Python script.