Skip to content

Qualcomm AI Engine Direct - Unify Llama2&Llama3 and Small Accuracy Improvement. #7618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 18, 2025

Conversation

winskuo-quic
Copy link
Collaborator

Summary

  • Initially, to expedite the development process, we created the llama3 script and runner instead of reusing llama2's. Multiple scripts and runners seem to confuse community users and appears unnecessary. This PR unifies llama2 and llama3 flow to make them easier to maintain and more readable.
  • Fix hybrid mode accuracy issue in llama hybrid mode.
  • Copied KV mode output quant_attrs and apply it to Prefill mode. We aim to align hybrid mode's result with single kv/prefill model. Below is the testing results using stories llama, which we can see the results generated by hybrid mode is relatively similar to single kv/prefill model after copying quant_attrs from KV mode to Prefill mode.

Test plan

  • Updated the CI to use unified llama script.

  • TODO: Add more UT for llama

  • Ensure 1B model inference speed does not drop with 128prefill/512kv
    image

  • Before and After for Stories Llama 16a4w 32prefill/128kv, Prompt: "Once upon a time"

    Before(Current Mainline)

    prefill:
    Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big,

    kv:
    Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big, red ball. One day, Lily's mom asked her to help her with the laundry. Lily was happy to help and she put all the clothes in the washing machine. After the clothes were washed, Lily's mom asked her to help her hang them up to dry. Lily saw a big, black rake and asked her mom what it was. Her mom told her it was a rake and that it helps to

    hybrid:
    Once upon a time, there was a little girl named Lily. She loved to play with her toys and her friends. One day, Lily's mommy told her that they were going to visit her grandma who lived far away. Lily was excited to see her grandma, but she was also a little scared because she had never been on a long trip before. When they arrived at grandma's house, Lily saw a big, scary dog. She was afraid and didn't want to go near it. But her mommy told her that the dog was friendly and just

    After (With this PR)

    prefill:
    Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big,

    kv:
    Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big, red ball. One day, Lily's mom asked her to help her with the laundry. Lily was happy to help and she put all the clothes in the washing machine. After the clothes were washed, Lily's mom asked her to help her hang them up to dry. Lily saw a big, black rake and asked her mom what it was. Her mom told her it was a rake and that it helps to

    hybrid:
    Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite toy was a big, red ball. One day, Lily's mom asked her to help her with the laundry. Lily was happy to help and she put all the clothes in the washing machine. After the clothes were washed, Lily's mom asked her to help her hang them up to dry. Lily saw a big, black sheet hanging on the line and she wanted to help. She grabbed the sheet and tried to hang it up

Copy link

pytorch-bot bot commented Jan 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7618

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit b383d09 with merge base 3f9324c (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2025
@winskuo-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR is mainly to:

  1. Merge llama2 and llama3 scripts.
  2. Fix hybrid mode runner issue in mainline.
  3. Apply KV mode output quant_attrs to prefill mode, which we can see the results for hybrid mode is relatively similar to single kv/prefill mode after this PR (Refer to summary section for the test results).

Please have a look at this PR and let me know if anything is unclear.
Thanks.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sorry miss this one

@cccclai cccclai merged commit 66bfd75 into pytorch:main Jan 18, 2025
45 of 47 checks passed
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Jan 23, 2025
Summary: Forward fix for the previous PR pytorch#7618

Reviewed By: tarun292

Differential Revision: D68362771
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Jan 23, 2025
Summary:
Pull Request resolved: pytorch#7871

Forward fix for the previous PR pytorch#7618

Reviewed By: tarun292

Differential Revision: D68362771
@cccclai
Copy link
Contributor

cccclai commented Jan 23, 2025

Sorry need to revert this PR and re-land it. Because the internal failure can't be bypassed.

@winskuo-quic
Copy link
Collaborator Author

winskuo-quic commented Jan 24, 2025

Sorry need to revert this PR and re-land it. Because the internal failure can't be bypassed.

Hi @cccclai,
No worries. I think I forgot to update the CI yml files to the new path:

python -m examples.qualcomm.oss_scripts.llama3_2.llama -- \

Unsure if that is the root cause of internal failure.
Also, I am working on CI for static stories llama: #7884.
Will continue on this once it is re-land. Thanks.

cccclai pushed a commit to cccclai/executorch-1 that referenced this pull request Jan 28, 2025
…provement. (pytorch#7618)

Qualcomm AI Engine Direct - Unify Llama2 and Llama3
cccclai pushed a commit to cccclai/executorch-1 that referenced this pull request Jan 28, 2025
…provement. (pytorch#7618)

Qualcomm AI Engine Direct - Unify Llama2 and Llama3
cccclai pushed a commit to cccclai/executorch-1 that referenced this pull request Jan 28, 2025
…provement. (pytorch#7618)

Qualcomm AI Engine Direct - Unify Llama2 and Llama3
YIWENX14 pushed a commit that referenced this pull request Jan 28, 2025
…provement. (#7618)

Qualcomm AI Engine Direct - Unify Llama2 and Llama3
cccclai pushed a commit to cccclai/executorch-1 that referenced this pull request Jan 28, 2025
…provement. (pytorch#7618)

Qualcomm AI Engine Direct - Unify Llama2 and Llama3
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Jan 30, 2025
…provement. (pytorch#7618)

Qualcomm AI Engine Direct - Unify Llama2 and Llama3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants