Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix GQA Rotary Embedding sequence length (#19801)
### Description Previously, GQA incorrectly enforced rotary cos and sin cache to be of sequence length equal to present sequence length. Now it enforces that it be greater than or equal to present sequence length since to match Rotary Embedding Op it should be of max_sequence_length ### Motivation and Context Fixes issue with fusing Rotary Embedding and GQA for certain models which prefer this optimization.
- Loading branch information