Are there any plans for supporting an explicit attention mask? #840

Avelina9X · 2024-02-20T16:42:25Z

I've noticed that the Triton implementation supports explicit attention bias, which can be used to support arbitrary mask shapes with large negative values, however is there any planned support for explicit (boolean) masks in the CUDA implementation?

I've noticed some requests for features like off-diagonal attention, but an explicit attention mask would be able to facilitate this and any other arbitrary masking scheme - such as XL-Net, attention sinking, landmark attention - without needing to hardcode the attention scheme and enable it with an argument or seperate python interface.

normster · 2024-03-05T21:23:49Z

It seems like the PyTorch attention implementation supports custom attention masks and also uses Flash-Attention 2: https://twitter.com/StasBekman/status/1736083447658225665. Though I'm not sure that passing in an attention mask doesn't cause the op to dispatch to a non-FA2 kernel.

tridao · 2024-03-05T21:25:27Z

If there's attn mask pytorch does not dispatch to FA2 kernel, rather the kernel from xformers.

abdulfatir · 2024-03-29T01:00:57Z

Thanks for the info @tridao! Is support for arbitrary attention masks on your roadmap? This would be incredibly useful for some encoder-decoder and prefixLM models. Mandatory thank you for your amazing work!

xiabingquan · 2024-09-10T06:24:38Z

If there's attn mask pytorch does not dispatch to FA2 kernel, rather the kernel from xformers.

Thanks for this valuable tip. No wonder torch.nn.functional.scaled_dot_product_attention does not bring any speed up in my case

lin-ht · 2024-09-10T17:56:43Z

I'm looking for bias mask support too, in FA2 and better FA3. Is there a roadmap for this? Thank you~

abdulfatir mentioned this issue Mar 29, 2024

Use efficient implementation of attention amazon-science/chronos-forecasting#33

Open

thorinf-orca mentioned this issue Apr 15, 2024

Sparse Masking (for Graphs) #918

Open

KexinFeng mentioned this issue Apr 21, 2024

Any plans to support tree attention mask? #924

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there any plans for supporting an explicit attention mask? #840

Are there any plans for supporting an explicit attention mask? #840

Avelina9X commented Feb 20, 2024

normster commented Mar 5, 2024

tridao commented Mar 5, 2024

abdulfatir commented Mar 29, 2024

xiabingquan commented Sep 10, 2024

lin-ht commented Sep 10, 2024

Are there any plans for supporting an explicit attention mask? #840

Are there any plans for supporting an explicit attention mask? #840

Comments

Avelina9X commented Feb 20, 2024

normster commented Mar 5, 2024

tridao commented Mar 5, 2024

abdulfatir commented Mar 29, 2024

xiabingquan commented Sep 10, 2024

lin-ht commented Sep 10, 2024