-
Notifications
You must be signed in to change notification settings - Fork 594
Qualcomm AI Engine Direct - Unify Llama2&Llama3 and Small Accuracy Improvement. #7618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - Unify Llama2&Llama3 and Small Accuracy Improvement. #7618
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7618
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit b383d09 with merge base 3f9324c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @cccclai,
Please have a look at this PR and let me know if anything is unclear. |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sorry miss this one
Summary: Forward fix for the previous PR pytorch#7618 Reviewed By: tarun292 Differential Revision: D68362771
Summary: Pull Request resolved: pytorch#7871 Forward fix for the previous PR pytorch#7618 Reviewed By: tarun292 Differential Revision: D68362771
Sorry need to revert this PR and re-land it. Because the internal failure can't be bypassed. |
Hi @cccclai, executorch/.github/workflows/android-perf.yml Line 276 in 9a0b51c
Unsure if that is the root cause of internal failure. Also, I am working on CI for static stories llama: #7884. Will continue on this once it is re-land. Thanks. |
…provement. (pytorch#7618) Qualcomm AI Engine Direct - Unify Llama2 and Llama3
…provement. (pytorch#7618) Qualcomm AI Engine Direct - Unify Llama2 and Llama3
…provement. (pytorch#7618) Qualcomm AI Engine Direct - Unify Llama2 and Llama3
…provement. (#7618) Qualcomm AI Engine Direct - Unify Llama2 and Llama3
…provement. (pytorch#7618) Qualcomm AI Engine Direct - Unify Llama2 and Llama3
…provement. (pytorch#7618) Qualcomm AI Engine Direct - Unify Llama2 and Llama3
Summary
Test plan
Updated the CI to use unified llama script.
TODO: Add more UT for llama
Ensure 1B model inference speed does not drop with 128prefill/512kv

Before and After for Stories Llama 16a4w 32prefill/128kv, Prompt: "Once upon a time"
Before(Current Mainline)
prefill:
Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big,
kv:
Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big, red ball. One day, Lily's mom asked her to help her with the laundry. Lily was happy to help and she put all the clothes in the washing machine. After the clothes were washed, Lily's mom asked her to help her hang them up to dry. Lily saw a big, black rake and asked her mom what it was. Her mom told her it was a rake and that it helps to
hybrid:
Once upon a time, there was a little girl named Lily. She loved to play with her toys and her friends. One day, Lily's mommy told her that they were going to visit her grandma who lived far away. Lily was excited to see her grandma, but she was also a little scared because she had never been on a long trip before. When they arrived at grandma's house, Lily saw a big, scary dog. She was afraid and didn't want to go near it. But her mommy told her that the dog was friendly and just
After (With this PR)
prefill:
Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big,
kv:
Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big, red ball. One day, Lily's mom asked her to help her with the laundry. Lily was happy to help and she put all the clothes in the washing machine. After the clothes were washed, Lily's mom asked her to help her hang them up to dry. Lily saw a big, black rake and asked her mom what it was. Her mom told her it was a rake and that it helps to
hybrid:
Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big, red ball. One day, Lily's mom asked her to help her with the laundry. Lily was happy to help and she put all the clothes in the washing machine. After the clothes were washed, Lily's mom asked her to help her hang them up to dry. Lily saw a big, black sheet hanging on the line and she wanted to help. She grabbed the sheet and tried to hang it up