You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The flash_attn_with_kvcach function currently does not return the intermediate outputs, such as block_lse or attention_score, which are useful for further analysis or debugging. Can you add an option to return these outputs would enhance flexibility and allow users to gain more insight into the attention computation process.
The text was updated successfully, but these errors were encountered:
I have a triton version but the speed is 2x slower than your cuda code.
But I don't know how to impl it with cuda code. But it should not be very hard, especially for block_lse, so can you slightly modify the code to support this func?
Hi Dao,
The flash_attn_with_kvcach function currently does not return the intermediate outputs, such as block_lse or attention_score, which are useful for further analysis or debugging. Can you add an option to return these outputs would enhance flexibility and allow users to gain more insight into the attention computation process.
The text was updated successfully, but these errors were encountered: