Skip to content

Fully transition from flash-attn to kernels #4380

@qgallouedec

Description

@qgallouedec

The new recommended way to use flash attention is to use kernels. We should update our tests, and documentation to use kernels instead of "flash_attention2". Eg

training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "flash_attention_2"})

- training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "flash_attention_2"}) 
+ training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "kernels-community/flash-attn2"}) 

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions