Skip to content

Is the KV size calculation correct? #2

@wishstudio

Description

@wishstudio

The KV cache size calculation seems to be way off.
A brief scan of the code shows it never use the kv head count and K/V length fields. It simply uses the embedding_length for estimation.
But embedding_length has nothing to do with the K/V dimensions, which should be calculated from the head_count_kv, key_length and value_length fields. Depending on the model, the wrong calculation results in several times difference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions