Replies: 1 comment 8 replies
-
The rolling buffer aspect is already supported via the changes in #3228. One has to just manually call The missing part is the support for per-layer positions in the KV cache. I'm still not sure if this is what is being proposed in the paper, but it looks like the KV cache for layer A specific example: let's say we have
What happens when we have processed
For me it is not 100% clear from the paper, but it makes sense to be like this. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Mistral just released their paper on Mistral 7B.
They use a rolling buffer cache to reduce the memory footprint of the model.
Do you think this is something we could implement with a ring buffer for instance?
Beta Was this translation helpful? Give feedback.
All reactions