kv-cache : improve defrag logic

Following the optimization in #13493, I realized that the defragmentation can become much better so that it can further improve  the Flash Attention masking. 

Currently we defrag the following cache like this:

```bash
# before defrag
00000000...11111.......2222222....2010212012012....

# after defrag
000000001111122222222010212012012..................
```

I.e. we only "fill" the holes, but the sequences remain scattered. We can do better like this:

```
# new defrag
000000000000111111111222222222222..................
```

By doing so, the [FA-vec masking logic](#13493) will remain effective even after many generations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kv-cache : improve defrag logic #13497

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kv-cache : improve defrag logic #13497

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions