Update static attention IO manager to use "smart mask" style update #9843

sxu · 2025-04-02T17:35:39Z

Differential Revision: D72322014

pytorch-bot · 2025-04-02T17:35:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9843

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ad4000b with merge base 1572381 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-02T17:35:49Z

This pull request was exported from Phabricator. Differential Revision: D72322014

…ytorch#9843) Summary: Pull Request resolved: pytorch#9843 Differential Revision: D72322014

facebook-github-bot · 2025-04-02T17:41:46Z

This pull request was exported from Phabricator. Differential Revision: D72322014

…ytorch#9843) Summary: Pull Request resolved: pytorch#9843 Differential Revision: D72322014

facebook-github-bot · 2025-04-02T18:28:16Z

This pull request was exported from Phabricator. Differential Revision: D72322014

kimishpatel · 2025-04-03T02:50:41Z

examples/models/llama/runner/static_attention_io_manager.h

   * Update the internal data pointers using the cache updates returned by the
   * model. This length of each individual update cannot exceed the max update
-   * length specified during the creation, and the total length cannot exceed
-   * the context length.
+   * length specified during creation, and the total length cannot exceed the
+   * cache length.


In the export llama script this was called max_context_length and max_seq_length

This is on purpose, if you fix max context length and want to share the same cache between multiple methods with different input length (e.g. prefill + decode), you force yourself to also have different cache lengths as well. This makes things complicated when you need to switch between prefill and decode back and forth, as the wearable team has found out when trying to use QC's implementation.

Here cache length is fixed, and you combine it with different input length.

yeah I understand. I am just highlighting the nomenclature in export_llama. not questioning why it is done this way

Differential Revision: D72322014 Pull Request resolved: #9843

sxu requested review from jackzhxng and lucylq as code owners April 2, 2025 17:35

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2025

facebook-github-bot added the fb-exported label Apr 2, 2025

sxu requested a review from billmguo April 2, 2025 17:37

sxu added the topic: not user facing label Apr 2, 2025

sxu added a commit to sxu/executorch that referenced this pull request Apr 2, 2025

Update static attention IO manager to use "smart mask" style update (p…

e4ca1d9

…ytorch#9843) Summary: Pull Request resolved: pytorch#9843 Differential Revision: D72322014

sxu force-pushed the export-D72322014 branch from cc0b4ed to e4ca1d9 Compare April 2, 2025 17:41

Update static attention IO manager to use "smart mask" style update (p…

ad4000b

…ytorch#9843) Summary: Pull Request resolved: pytorch#9843 Differential Revision: D72322014

sxu force-pushed the export-D72322014 branch from e4ca1d9 to ad4000b Compare April 2, 2025 18:28

billmguo approved these changes Apr 3, 2025

View reviewed changes

kimishpatel reviewed Apr 3, 2025

View reviewed changes

facebook-github-bot merged commit 8606725 into pytorch:main Apr 3, 2025
88 of 90 checks passed

kirklandsign pushed a commit that referenced this pull request Apr 11, 2025

Update static attention IO manager to use "smart mask" style update

3f45402

Differential Revision: D72322014 Pull Request resolved: #9843

This was referenced Apr 14, 2025

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#28

Open

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#30

Open

github-actions bot mentioned this pull request May 5, 2025

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update static attention IO manager to use "smart mask" style update #9843

Update static attention IO manager to use "smart mask" style update #9843

Uh oh!

sxu commented Apr 2, 2025

Uh oh!

pytorch-bot bot commented Apr 2, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

kimishpatel Apr 3, 2025

Uh oh!

sxu Apr 3, 2025 •

edited

Loading

Uh oh!

kimishpatel Apr 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update static attention IO manager to use "smart mask" style update #9843

Update static attention IO manager to use "smart mask" style update #9843

Uh oh!

Conversation

sxu commented Apr 2, 2025

Uh oh!

pytorch-bot bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9843

✅ No Failures

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

kimishpatel Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

sxu Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Apr 2, 2025 •

edited

Loading

sxu Apr 3, 2025 •

edited

Loading