Qualcomm AI Engine Direct - documentation for KV cache update #8134

haowhsu-quic · 2025-02-03T08:12:55Z

On behalf of @DannyYuyang-quic

Summary

visualize KV cache update mechanism for better understanding
asset folder for storing diagrams

pytorch-bot · 2025-02-03T08:12:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8134

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

⏳ No Failures, 2 Pending

As of commit 2d11538 with merge base 62e49ce ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

haowhsu-quic · 2025-02-03T08:15:29Z

@pytorchbot label "release notes: qualcomm"

haowhsu-quic · 2025-02-03T08:17:04Z

Hi @cccclai, PR for documenting KV cache update mechanism. Please have a look, thank you.

summary - visualize KV cache update mechanism for better understanding - asset folder for storing diagrams

cccclai

Thank you for the nice diagram!

cccclai · 2025-02-05T18:11:58Z

examples/qualcomm/oss_scripts/llama/README.md

-Prefill Mode: This is also known as batch prefill mode, where the model takes in a list of tokens as input and generates the next token along with the key-value (KV) cache for all tokens. This mode is efficient for generating the initial sequence of tokens (usually the user's prompt).
+Prefill Mode: This is also known as batch prefill mode, where the model takes in a list of tokens as input and generates the next token along with the key-value (KV) cache for all tokens. This mode is efficient for encoding the user's prompt.

 KV Cache Mode: In KV Cache mode, the model takes in a single previous token and generates the next predicted token along with its KV cache. It is efficient for generating subsequent tokens after the initial prompt.


Let's rename it to generate mode, as this term isn't used very common and a bit confusing

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 3, 2025

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Feb 3, 2025

Qualcomm AI Engine Direct - documentation for KV cache update

2d11538

summary - visualize KV cache update mechanism for better understanding - asset folder for storing diagrams

haowhsu-quic force-pushed the dev_kv_doc branch from 8821e3e to 2d11538 Compare February 5, 2025 16:28

cccclai reviewed Feb 5, 2025

View reviewed changes

cccclai approved these changes Feb 5, 2025

View reviewed changes

cccclai merged commit 1b11e3e into pytorch:main Feb 5, 2025
45 checks passed

haowhsu-quic deleted the dev_kv_doc branch February 7, 2025 09:21

This was referenced Feb 11, 2025

Weekly pr metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#6

Open

Weekly pr metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#8

Open

This was referenced Feb 24, 2025

Weekly pr metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#10

Open

Weekly pr metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - documentation for KV cache update #8134

Qualcomm AI Engine Direct - documentation for KV cache update #8134

Uh oh!

haowhsu-quic commented Feb 3, 2025

Uh oh!

pytorch-bot bot commented Feb 3, 2025 •

edited

Loading

Uh oh!

haowhsu-quic commented Feb 3, 2025

Uh oh!

haowhsu-quic commented Feb 3, 2025

Uh oh!

cccclai left a comment

Uh oh!

cccclai Feb 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Qualcomm AI Engine Direct - documentation for KV cache update #8134

Qualcomm AI Engine Direct - documentation for KV cache update #8134

Uh oh!

Conversation

haowhsu-quic commented Feb 3, 2025

Summary

Uh oh!

pytorch-bot bot commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8134

❗ 1 Active SEVs

⏳ No Failures, 2 Pending

Uh oh!

haowhsu-quic commented Feb 3, 2025

Uh oh!

haowhsu-quic commented Feb 3, 2025

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

cccclai Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Feb 3, 2025 •

edited

Loading