Skip to content

kv-cache : improve defrag logic #13497

Open
@ggerganov

Description

@ggerganov

Following the optimization in #13493, I realized that the defragmentation can become much better so that it can further improve the Flash Attention masking.

Currently we defrag the following cache like this:

# before defrag
00000000...11111.......2222222....2010212012012....

# after defrag
000000001111122222222010212012012..................

I.e. we only "fill" the holes, but the sequences remain scattered. We can do better like this:

# new defrag
000000000000111111111222222222222..................

By doing so, the FA-vec masking logic will remain effective even after many generations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceSpeed related topicsroadmapPart of a roadmap project

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions