Skip to content

Conversation

@pian13131
Copy link
Contributor

Hello, I am currently studying the vLLM paged attention kernel, and I've found that the implementation can be quite complex for newcomers. After thoroughly reviewing the primary implementation of the kernel in csrc/attention/attention_kernels.cu, I have composed this document to provide a high-level understanding of the paged attention kernel. The document covers explanations on memory layout, read patterns, and step-by-step calculations, accompanied by diagrams and pseudo-code. It is intended to serve as a valuable reference for individuals interested in the implementation of the paged attention kernel.

Given that I am still a novice in this subject, there may be some misunderstandings in the document. I welcome any comments and advice to enhance its accuracy and clarity. Your feedback is highly appreciated! Wish this document can be merged to help other people!

@simon-mo
Copy link
Collaborator

@simon-mo
Copy link
Collaborator

Thank you for this great write up!

@zhaoyang-star
Copy link
Contributor

This doc is very usefull. Hope to be merged soon.

@esmeetu esmeetu added the documentation Improvements or additions to documentation label Mar 2, 2024
Copy link
Collaborator

@LiuXiaoxuanPKU LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks! Some minor comments.

@LiuXiaoxuanPKU LiuXiaoxuanPKU merged commit 27a7b07 into vllm-project:main Mar 4, 2024
@WoosukKwon
Copy link
Collaborator

@pian13131 This is AWESOME! Thanks for your contribution!

dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants