Skip to content

How does this compare to MQA (multi-query attention)? #169

Closed
@xpl

Description

@xpl

https://arxiv.org/abs/1911.02150

For example, StarCoder uses MQA to speed up inference. How does PagedAttention compare to Multi-Query Attention? Are they compatible?

Metadata

Metadata

Assignees

No one assigned

    Labels

    new-modelRequests to new models

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions