Skip to content

Update static attention IO manager to use "smart mask" style update #9843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2025

Conversation

sxu
Copy link
Contributor

@sxu sxu commented Apr 2, 2025

Differential Revision: D72322014

@sxu sxu requested review from lucylq and jackzhxng as code owners April 2, 2025 17:35
Copy link

pytorch-bot bot commented Apr 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9843

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ad4000b with merge base 1572381 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72322014

@sxu sxu requested a review from billmguo April 2, 2025 17:37
sxu added a commit to sxu/executorch that referenced this pull request Apr 2, 2025
…ytorch#9843)

Summary: Pull Request resolved: pytorch#9843

Differential Revision: D72322014
@sxu sxu force-pushed the export-D72322014 branch from cc0b4ed to e4ca1d9 Compare April 2, 2025 17:41
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72322014

…ytorch#9843)

Summary: Pull Request resolved: pytorch#9843

Differential Revision: D72322014
@sxu sxu force-pushed the export-D72322014 branch from e4ca1d9 to ad4000b Compare April 2, 2025 18:28
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72322014

Comment on lines 140 to +143
* Update the internal data pointers using the cache updates returned by the
* model. This length of each individual update cannot exceed the max update
* length specified during the creation, and the total length cannot exceed
* the context length.
* length specified during creation, and the total length cannot exceed the
* cache length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the export llama script this was called max_context_length and max_seq_length

Copy link
Contributor Author

@sxu sxu Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on purpose, if you fix max context length and want to share the same cache between multiple methods with different input length (e.g. prefill + decode), you force yourself to also have different cache lengths as well. This makes things complicated when you need to switch between prefill and decode back and forth, as the wearable team has found out when trying to use QC's implementation.

Here cache length is fixed, and you combine it with different input length.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I understand. I am just highlighting the nomenclature in export_llama. not questioning why it is done this way

@facebook-github-bot facebook-github-bot merged commit 8606725 into pytorch:main Apr 3, 2025
88 of 90 checks passed
kirklandsign pushed a commit that referenced this pull request Apr 11, 2025
Differential Revision: D72322014

Pull Request resolved: #9843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants