Skip to content

Load external data in HQQ and RTN quantization passes#2380

Merged
xiaoyu-work merged 1 commit intomicrosoft:mainfrom
Lidang-Jiang:fix/onnx-quantization-external-data
Apr 8, 2026
Merged

Load external data in HQQ and RTN quantization passes#2380
xiaoyu-work merged 1 commit intomicrosoft:mainfrom
Lidang-Jiang:fix/onnx-quantization-external-data

Conversation

@Lidang-Jiang
Copy link
Copy Markdown
Contributor

Summary

Fixes #2223

After the onnx-ir migration, ir.load() no longer loads external data by default. This caused OnnxHqqQuantization to produce invalid output models when the input model stores weights as external data — the original weight tensors retained stale external data references pointing to non-existent files in the output directory.

Changes:

  • Add ir.external_data.load_to_model(ir_model) after load_ir_model() in both OnnxHqqQuantization and OnnxBlockWiseRtnQuantization, consistent with the pattern already used in extract_adapters.py (line 104)
  • Add regression tests for both passes with external data models
Before

```
Created model with external data (12.0 MB)
Input model validation: PASSED

--- olive quantize --algorithm hqq ---
Exit code: 0
Output model validation: FAILED
Data of TensorProto ( tensor name: W1) should be stored in /tmp/tmpkh7vp8nn/quantized_hqq/model.onnx.data, but it is not regular file.

--- olive quantize --algorithm rtn ---
Exit code: 0
Output model validation: PASSED
```

After

```
Created model with external data (12.0 MB)
Input model validation: PASSED

--- olive quantize --algorithm hqq ---
Exit code: 0
Output model validation: PASSED

--- olive quantize --algorithm rtn ---
Exit code: 0
Output model validation: PASSED
```

Unit tests (13 passed)

```
test_hqq_quantization_pass PASSED
test_hqq_quantization_pass_produces_valid_output_when_model_has_external_data PASSED
test_rtn_quantization_pass_matmul[True] PASSED
test_rtn_quantization_pass_matmul[False] PASSED
test_rtn_quantization_pass_gather[True] PASSED
test_rtn_quantization_pass_gather[False] PASSED
test_rtn_quantization_pass_produces_valid_output_when_model_has_external_data PASSED
test_rtn_quantization_with_exclusion PASSED
test_rtn_quantization_gather_8bit[True] PASSED
test_rtn_quantization_gather_8bit[False] PASSED
test_rtn_quantization_gather_quantize_axis_forced_to_last_dim PASSED
test_rtn_quantization_shared_gather_weights PASSED
test_rtn_quantization_removes_unused_initializers PASSED

======================== 13 passed, 2 warnings in 2.78s ========================
```

Test plan

  • E2E: olive quantize --algorithm hqq on model with external data produces valid output
  • E2E: olive quantize --algorithm rtn on model with external data produces valid output
  • Unit test: test_hqq_quantization_pass_produces_valid_output_when_model_has_external_data
  • Unit test: test_rtn_quantization_pass_produces_valid_output_when_model_has_external_data
  • All 13 existing + new tests pass
  • ruff check clean on modified files

After the onnx-ir migration, `ir.load()` no longer loads external data
by default. This caused HQQ quantization to produce invalid output
models when the input model stores weights as external data, because
the original weight tensors retained stale external data references
that pointed to non-existent files in the output directory.

Fix by calling `ir.external_data.load_to_model()` after `load_ir_model()`
in both HQQ and RTN quantization passes, consistent with the pattern
already used in `extract_adapters.py`.

Fixes microsoft#2223

Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>
@Lidang-Jiang
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree [company="{your company}"]

@Lidang-Jiang
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

@jambayk jambayk requested a review from xiaoyu-work April 7, 2026 21:14
@xiaoyu-work
Copy link
Copy Markdown
Collaborator

@justinchuby can you confirm this logic? this PR is for #2223

Copy link
Copy Markdown
Contributor

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm if you want all weights in memory

@xiaoyu-work xiaoyu-work merged commit 0c665dc into microsoft:main Apr 8, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ONNX Quantization Broken Since v0.9.2 Due to Incorrect Weight Data Saving

3 participants