-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
The KV cache size calculation seems to be way off.
A brief scan of the code shows it never use the kv head count and K/V length fields. It simply uses the embedding_length for estimation.
But embedding_length has nothing to do with the K/V dimensions, which should be calculated from the head_count_kv, key_length and value_length fields. Depending on the model, the wrong calculation results in several times difference.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels