Closed
Description
https://arxiv.org/abs/1911.02150
For example, StarCoder uses MQA to speed up inference. How does PagedAttention compare to Multi-Query Attention? Are they compatible?
https://arxiv.org/abs/1911.02150
For example, StarCoder uses MQA to speed up inference. How does PagedAttention compare to Multi-Query Attention? Are they compatible?