-
Notifications
You must be signed in to change notification settings - Fork 607
Add quantized op support to llama runner #3062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3062
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 41abbb5 with merge base 458d743 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
.ci/scripts/test_llama.sh
Outdated
@@ -84,7 +91,7 @@ cmake_build_llama_runner() { | |||
-DEXECUTORCH_BUILD_CUSTOM="$CUSTOM" \ | |||
-DEXECUTORCH_BUILD_OPTIMIZED=ON \ | |||
-DEXECUTORCH_BUILD_XNNPACK="$XNNPACK" \ | |||
-DEXECUTORCH_BUILD_OPTIMIZED=ON \ | |||
-DEXECUTORCH_BUILD_QUANTIZED="$QE" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not always build? haveing so many options for build feels like additional burden for users. Maybe do default opt-in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
EXPORT_ARGS="${EXPORT_ARGS} -kv --use_sdpa_with_kv_cache -X -qmode 8da4w -G 128" | ||
EXPORT_ARGS="-c stories110M.pt -p ${PARAMS} -d ${DTYPE} -n ${EXPORTED_MODEL_NAME} -kv" | ||
if [[ "${XNNPACK}" == "ON" ]]; then | ||
EXPORT_ARGS="${EXPORT_ARGS} -X -qmode 8da4w -G 128" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does += operator work?
@@ -91,6 +92,7 @@ add_subdirectory(runner) | |||
if(EXECUTORCH_USE_TIKTOKEN) | |||
# find RE2 for tokenizer | |||
set(ABSL_ENABLE_INSTALL ON) | |||
set(ABSL_PROPAGATE_CXX_STD ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh we depend on abseil for tiktoken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, tiktoken -> re2 -> abseil
@@ -91,6 +92,7 @@ add_subdirectory(runner) | |||
if(EXECUTORCH_USE_TIKTOKEN) | |||
# find RE2 for tokenizer | |||
set(ABSL_ENABLE_INSTALL ON) | |||
set(ABSL_PROPAGATE_CXX_STD ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no tests using this path yet right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet
EXPORT_ARGS="${EXPORT_ARGS} --use_sdpa_with_kv_cache" | ||
fi | ||
if [[ "${QE}" == "ON" ]]; then | ||
EXPORT_ARGS="${EXPORT_ARGS} --embedding-quantize 8,1024" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding tests!
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56197863](https://our.internmc.facebook.com/intern/diff/D56197863) [ghstack-poisoned]
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request has been merged in 1f4b631. |
Summary: As titled. Got too excited in #3062 and removed `EXECUTORCH_BUILD_QUANTIZED`. Looking at the CI job failure of `build-apple-framework` probably worth adding it back. Test Plan: See that CI job pass Reviewers: Subscribers: Tasks: Tags:
Summary: As titled. Got too excited in #3062 and removed `EXECUTORCH_BUILD_QUANTIZED`. Looking at the CI job failure of `build-apple-framework` probably worth adding it back. Test Plan: See that CI job pass Reviewed By: shoumikhin Differential Revision: D56281923 Pulled By: larryliu0820
Summary: As titled. Got too excited in #3062 and removed `EXECUTORCH_BUILD_QUANTIZED`. Looking at the CI job failure of `build-apple-framework` probably worth adding it back. Pull Request resolved: #3115 Test Plan: See that CI job pass Reviewed By: shoumikhin Differential Revision: D56281923 Pulled By: larryliu0820 fbshipit-source-id: e6ad411f763ff8e11d4fb1e0bc7037eb2cf69357
Stack from ghstack (oldest at bottom):
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D56197863