You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From reading this thread: pytorch/pytorch#96099 (comment)
It seems to me that the relative positional embedding can be integrated with scaled_dot_product_attention 's attn_mask argument. However, it can be slow as it's not taking the "fast path".
Do you think we can keep this option open for users who wants to use flash_attention and rel_pos_embedding?
I would think that Dao-AILab/flash-attention#617 needs to be completed for FAv2 support for arbitrary attention bias. And then depending on actual needed relative encoding formula, maybe Dao-AILab/flash-attention#956 could be pushed
Another way forward is trying PyTorch's flex_attention which can fuse modification of attention matrix
From reading this thread:
pytorch/pytorch#96099 (comment)
It seems to me that the relative positional embedding can be integrated with
scaled_dot_product_attention
'sattn_mask
argument. However, it can be slow as it's not taking the "fast path".Do you think we can keep this option open for users who wants to use flash_attention and rel_pos_embedding?
Originally posted by @mingxin-zheng in #7977 (comment)
The text was updated successfully, but these errors were encountered: