This repository was archived by the owner on Jun 24, 2024. It is now read-only.

Description
Just found a recent blog https://vllm.ai/ and repo https://github.com/vllm-project/vllm that implements paged attention. Tested this out and it provides massive throughput and memory efficiency improvements.
Can we implement something like this? The paper isn't out yet. But shouldn't Rust be very good at this in theory with it's memory safety guarantees.