Closed
Description
Regarding the comments here: #7715 (comment)
We have removed the flash attention from the generative components merged into the core. However, based on experiments conducted by @dongyang0122, there appears to be a significant difference between using and not using flash attention. We should consider adding this option back. @dongyang0122 will share more detailed comparison results from the experiments.
Metadata
Metadata
Assignees
Labels
No labels