Skip to content

Support q, k, v lengths not derived from n_embd #4648

Closed
@postmasters

Description

@postmasters

Feature Description

In the "Attention is all you need" paper, the queries and keys share the same dimension of $d_k$ and the values of $d_v$. Though the paper chose to make $d_k = d_v = d_{model} / h$, that is not a requirement for the network.

It would be great to support different key and value lengths.

Motivation

Some upcoming models employ different key lengths than $d_{model} / h$. This feature would allow those models to be ported over to this project.

Possible Implementation

Other than plumbing to get these new values for $d_k$ and $d_v$, we also have to revisit where n_embd, n_embd_gqa, n_embd_head, n_rot, and n_head_kv are used to make sure the assumptions are still sane.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions