Closed
Description
Feature Description
In the "Attention is all you need" paper, the queries and keys share the same dimension of
It would be great to support different key and value lengths.
Motivation
Some upcoming models employ different key lengths than
Possible Implementation
Other than plumbing to get these new values for n_embd
, n_embd_gqa
, n_embd_head
, n_rot
, and n_head_kv
are used to make sure the assumptions are still sane.