-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for quantized bmm #4047
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4047
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (1 Unrelated Failure)As of commit 49950a9 with merge base 63e8025 (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D58959269 |
This pull request was exported from Phabricator. Differential Revision: D58959269 |
Summary: Pull Request resolved: #4047 The current quantizer only captures "fake" bmm from matmuls with specific shapes. Add support for `torch.bmm` as well. Differential Revision: D58959269
59f03b6
to
92cab3c
Compare
Summary: The current quantizer only captures "fake" bmm from matmuls with specific shapes. Add support for `torch.bmm` as well. Use a decomposition for SDPA to make sure LLaMa bmms get quantized. Differential Revision: D58959269
Summary: Pull Request resolved: #4047 The current quantizer only captures "fake" bmm from matmuls with specific shapes. Add support for `torch.bmm` as well. Use a decomposition for SDPA to make sure LLaMa bmms get quantized. Differential Revision: D58959269
Summary: Pull Request resolved: #4047 The current quantizer only captures "fake" bmm from matmuls with specific shapes. Add support for `torch.bmm` as well. Use a decomposition for SDPA to make sure LLaMa bmms get quantized. Differential Revision: D58959269 Reviewed By: zonglinpengmeta, hsharma35
Summary: The current quantizer only captures "fake" bmm from matmuls with specific shapes. Add support for `torch.bmm` as well. Use a decomposition for SDPA to make sure LLaMa bmms get quantized. Reviewed By: zonglinpengmeta, hsharma35 Differential Revision: D58959269
92cab3c
to
a83f95e
Compare
This pull request was exported from Phabricator. Differential Revision: D58959269 |
Summary: Pull Request resolved: #4047 The current quantizer only captures "fake" bmm from matmuls with specific shapes. Add support for `torch.bmm` as well Reviewed By: zonglinpengmeta, hsharma35 Differential Revision: D58959269
This pull request was exported from Phabricator. Differential Revision: D58959269 |
a83f95e
to
49950a9
Compare
@@ -28,6 +28,7 @@ python_library( | |||
"compiler.py", | |||
], | |||
deps = [ | |||
"fbsource//third-party/pypi/pyre-extensions:pyre-extensions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Buck target used here or meant not to export?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSS uses both buck and CMake I believe for now
@@ -28,6 +28,7 @@ python_library( | |||
"compiler.py", | |||
], | |||
deps = [ | |||
"fbsource//third-party/pypi/pyre-extensions:pyre-extensions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSS uses both buck and CMake I believe for now
Summary: The current quantizer only captures "fake" bmm from matmuls with specific shapes. Add support for `torch.bmm` as well Reviewed By: dulinriley, zonglinpengmeta, hsharma35 Differential Revision: D58959269
This pull request has been merged in cfbe63d. |
Summary: The current quantizer only captures "fake" bmm from matmuls with specific shapes. Add support for
torch.bmm
as well.Differential Revision: D58959269