Aot compiler fix #9634

mcr229 · 2025-03-25T23:52:14Z

Summary

Changes:

When initializing Llama2 for aot_compiler, since checkpoints can only e downloaded from hugging face, we initialize llama2 with uninitialized weights. The problem with this is that when running quantization, we can run into errors with the histogram if the unitialized values are nan. We fix this by initializing the weights with zeros if no check point is provided. This enforces that quantization step can still work.
Quant Type in AoT compiler. When looking at the model options available to XNNPACK, everything is quantized with per-tensor static quantization. This isn't the best option for all the models available. For example transformer based models like Llama and MobileBert would likely prefer dynamically quantized per channel weights, where has CNN like MobileNet would prefer statically quantized per channel weights. We add this type of Quant Type to the existing models options. This also helps with Test Timeouts. per-tensor static quantization on a model like llama can take a long time due to the introduction of MANY q/dq nodes, and the complex partitions it creates. As a result, proposing partitions can take a long time due to the constant BFS to find the largest possible partition. By specifying the more apt quantization scheme like dynamic per-channel quantization, we can avoid this complexity.

Overall this should help with flakey [nan, nan] errors in the quantization histogram, and it should also help with CI timing out.

Test plan

OSS XNNPACK CI for all model delegation

cc @digantdesai @cbilgin

pytorch-bot · 2025-03-25T23:52:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9634

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit ccf664e with merge base 4b8ac94 ():

NEW FAILURE - The following job has failed:

trunk / test-models-linux-aarch64 (phi_4_mini, portable) / linux-job (gh)
RuntimeError: Command docker exec -t fe2f1f663ff952269e03a3d674e379f9abda3381abdf9d58218ccb94f5758da3 /exec failed with exit code 137

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mergennachin

Great!

mcr229 · 2025-03-26T19:58:17Z

phi_4_mini ci tests are failing with:

.ci/scripts/test_model.sh: line 75: 30893 Killed                  "${PYTHON_EXECUTABLE}" -m examples.models.llama.export_llama --model "${MODEL_NAME}" -c examples/models/llama/params/demo_rand_params.pth -p examples/models/phi_4_mini/config.json

there is no error message with the run, so I don't think anything is failing but just that this tests is getting killed. Running it locally on my laptop seems to pass

jackzhxng

Seems fine, doesn't seem like it should have affected the phi test. If it's consistently getting killed it might be OOMing

From latest viable/strict: https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Fixes #144480 This commit has important CI stability fixes, such as pytorch/executorch#9561 and pytorch/executorch#9634 Pull Request resolved: #150308 Approved by: https://github.com/jathu, https://github.com/malfet

@digantdesai

…ntization for Example Models (#9634) ### Summary Changes: 1. When initializing Llama2 for aot_compiler, since checkpoints can only e downloaded from hugging face, we initialize llama2 with uninitialized weights. The problem with this is that when running quantization, we can run into errors with the histogram if the unitialized values are nan. We fix this by initializing the weights with zeros if no check point is provided. This enforces that quantization step can still work. 2. Quant Type in AoT compiler. When looking at the model options available to XNNPACK, everything is quantized with per-tensor static quantization. This isn't the best option for all the models available. For example transformer based models like Llama and MobileBert would likely prefer dynamically quantized per channel weights, where has CNN like MobileNet would prefer statically quantized per channel weights. We add this type of Quant Type to the existing models options. This also helps with Test Timeouts. per-tensor static quantization on a model like llama can take a long time due to the introduction of MANY q/dq nodes, and the complex partitions it creates. As a result, proposing partitions can take a long time due to the constant BFS to find the largest possible partition. By specifying the more apt quantization scheme like dynamic per-channel quantization, we can avoid this complexity. Overall this should help with flakey [nan, nan] errors in the quantization histogram, and it should also help with CI timing out. ### Test plan OSS XNNPACK CI for all model delegation cc @digantdesai @cbilgin

From latest viable/strict: https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Fixes pytorch#144480 This commit has important CI stability fixes, such as pytorch/executorch#9561 and pytorch/executorch#9634 Pull Request resolved: pytorch#150308 Approved by: https://github.com/jathu, https://github.com/malfet

From latest viable/strict: https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Fixes #144480 This commit has important CI stability fixes, such as pytorch/executorch#9561 and pytorch/executorch#9634 Pull Request resolved: #150308 Approved by: https://github.com/jathu, https://github.com/malfet

…53750) * Update ExecuTorch pin to latest viable/strict 3/28/2025 (#150308) From latest viable/strict: https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50 Fixes #144480 This commit has important CI stability fixes, such as pytorch/executorch#9561 and pytorch/executorch#9634 Pull Request resolved: #150308 Approved by: https://github.com/jathu, https://github.com/malfet * Use new hash from #150722 * Update executorch.txt --------- Co-authored-by: Mergen Nachin <mnachin@meta.com>

mcr229 requested review from digantdesai, lucylq and jackzhxng as code owners March 25, 2025 23:52

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2025

mcr229 requested review from mergennachin and GregoryComer March 25, 2025 23:52

mcr229 added the module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/ label Mar 25, 2025

metascroy approved these changes Mar 26, 2025

View reviewed changes

mcr229 added 2 commits March 25, 2025 17:39

Update Llama2 to use zero weights when no checkpoint is provided

698612f

Specify Quant Type in AoT Compiler for better results

ccf664e

mcr229 force-pushed the aot_compiler_fix branch from 10408cd to ccf664e Compare March 26, 2025 00:39

mcr229 added release notes: xnnpack Changes to the XNNPack backend delegate release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava labels Mar 26, 2025

GregoryComer approved these changes Mar 26, 2025

View reviewed changes

mergennachin added the ciflow/trunk label Mar 26, 2025

mergennachin approved these changes Mar 26, 2025

View reviewed changes

mcr229 merged commit 91be93c into pytorch:main Mar 26, 2025
249 of 254 checks passed

mcr229 deleted the aot_compiler_fix branch March 26, 2025 19:59

jackzhxng reviewed Mar 26, 2025

View reviewed changes

mergennachin mentioned this pull request Mar 31, 2025

Update ExecuTorch pin to latest viable/strict 3/28/2025 pytorch/pytorch#150308

Closed

huydhn mentioned this pull request May 16, 2025

Update ExecuTorch pin to latest viable/strict 3/28/2025 (#150308) pytorch/pytorch#153750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aot compiler fix #9634

Aot compiler fix #9634

Uh oh!

mcr229 commented Mar 25, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 25, 2025 •

edited

Loading

Uh oh!

mergennachin left a comment

Uh oh!

mcr229 commented Mar 26, 2025

Uh oh!

Uh oh!

jackzhxng left a comment

Uh oh!

Uh oh!

Aot compiler fix #9634

Aot compiler fix #9634

Uh oh!

Conversation

mcr229 commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9634

❌ 1 New Failure

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

mcr229 commented Mar 26, 2025

Uh oh!

Uh oh!

jackzhxng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mcr229 commented Mar 25, 2025 •

edited

Loading

pytorch-bot bot commented Mar 25, 2025 •

edited

Loading