[SYCL] Implement Flash attention. #7141

qnixsynapse · 2024-05-08T08:53:32Z

Currently Flash attention is available in CUDA and Metal backends in #5021.

From the paper: Flash attention is an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. [...] it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. [..]

Thing is whether dedicated Intel GPUs can benefit from it or not and it will be interesting to see how much the performance improves.

NeoZhangJianyu · 2024-05-08T15:09:35Z

@qnixsynapse
Yes, Intel GPU will get benefit from it.
It has been verified in other AI framework.
We could consider to implement it in SYCL backend next.

qnixsynapse · 2024-05-09T11:41:40Z

@NeoZhangJianyu Nice. Thank you!

qnixsynapse · 2024-06-16T07:06:11Z

This issue has been tagged "stale" label. I am studying SYCL and C++ currently and waiting for major SYCL refactoring so that the code is readable and it will be easier for me to (eventually) implement flash attn kernel if needed.

Commented here to make this issue active again.

github-actions · 2024-08-01T01:07:12Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

piDack · 2024-09-16T03:55:00Z

Any progress?

qnixsynapse added the enhancement New feature or request label May 8, 2024

github-actions bot added the stale label Jun 9, 2024

slaren removed the stale label Jun 16, 2024

github-actions bot added the stale label Jul 18, 2024

github-actions bot closed this as completed Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Implement Flash attention. #7141

[SYCL] Implement Flash attention. #7141

qnixsynapse commented May 8, 2024

NeoZhangJianyu commented May 8, 2024

qnixsynapse commented May 9, 2024

qnixsynapse commented Jun 16, 2024

github-actions bot commented Aug 1, 2024

piDack commented Sep 16, 2024

[SYCL] Implement Flash attention. #7141

[SYCL] Implement Flash attention. #7141

Comments

qnixsynapse commented May 8, 2024

NeoZhangJianyu commented May 8, 2024

qnixsynapse commented May 9, 2024

qnixsynapse commented Jun 16, 2024

github-actions bot commented Aug 1, 2024

piDack commented Sep 16, 2024