Skip to content

Consideration of Flash Attention in Generative Components #7944

Closed
@KumoLiu

Description

@KumoLiu

Regarding the comments here: #7715 (comment)

We have removed the flash attention from the generative components merged into the core. However, based on experiments conducted by @dongyang0122, there appears to be a significant difference between using and not using flash attention. We should consider adding this option back. @dongyang0122 will share more detailed comparison results from the experiments.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions