This repository was archived by the owner on Jun 24, 2024. It is now read-only.

This repository was archived by the owner on Jun 24, 2024. It is now read-only.

Paged Attention #333

Open

Labels

issue:enhancementtopic:backend-support

opened

on Jun 26, 2023

Just found a recent blog https://vllm.ai/ and repo https://github.com/vllm-project/vllm that implements paged attention. Tested this out and it provides massive throughput and memory efficiency improvements.

Can we implement something like this? The paper isn't out yet. But shouldn't Rust be very good at this in theory with it's memory safety guarantees.

Metadata

Assignees

No one assigned

Labels

issue:enhancementtopic:backend-support

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests