generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Labels
✨ enhancementNew feature or requestNew feature or request📚 documentationImprovements or additions to documentationImprovements or additions to documentation
Description
The new recommended way to use flash attention is to use kernels. We should update our tests, and documentation to use kernels instead of "flash_attention2". Eg
trl/docs/source/reducing_memory_usage.md
Line 149 in 1eb561c
| training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "flash_attention_2"}) |
- training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "flash_attention_2"})
+ training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "kernels-community/flash-attn2"}) Sub-issues
Metadata
Metadata
Assignees
Labels
✨ enhancementNew feature or requestNew feature or request📚 documentationImprovements or additions to documentationImprovements or additions to documentation