You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @lkc1997 , we have another few arguments (rope_position to specify the rope position of each query) in our C++ APIs but we have ported them into PyTorch APIs, I'll do that in next release.
You can process rope outside attention and use attention kernel without rope at the moment.
I found that during shared-prefix calculation, this kenerl won't use qo_indptr to split batch queries which may cause rope error.
The text was updated successfully, but these errors were encountered: