Skip to content

add more ORT_ENFORCE when building the tree structure of TreeEnsemble operators#27677

Open
xadupre wants to merge 17 commits intomainfrom
xadupre/tree110413
Open

add more ORT_ENFORCE when building the tree structure of TreeEnsemble operators#27677
xadupre wants to merge 17 commits intomainfrom
xadupre/tree110413

Conversation

@xadupre
Copy link
Copy Markdown
Member

@xadupre xadupre commented Mar 16, 2026

Description

The building of the tree was implemented assuming the onnx models has been checked before.
This PR assumes it is not and verifies some indices are not outside boundaries.

The following validations were added or improved:

  • Bounds and range checks for target_ids/weights mapping when building leaf weight structures, with actionable diagnostic messages including index and container sizes.
  • Array size relationships (target_class_ids, target_class_nodeids, target_class_treeids, weights) validated once before iteration rather than per-loop.
  • Non-negative nodes_featureids enforced for non-leaf nodes.
  • base_values and base_values_as_tensor sizes validated against n_targets_or_classes for both regressors and classifiers, with correct attribute names in error messages.
  • Unit tests added to cover the new ORT_ENFORCE checks: out-of-range target_ids, negative featureids, and wrong-sized base_values for regressors and classifiers (binary and multi-class).

Motivation and Context

Security issues.

@xadupre xadupre requested a review from Copilot March 16, 2026 15:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens TreeEnsemble initialization by adding ORT_ENFORCE bounds checks around target/weight indexing to prevent out-of-bounds access when models have not been pre-validated, and adjusts a couple of ML operator tests accordingly.

Changes:

  • Added additional ORT_ENFORCE checks in TreeEnsembleCommon::Init to validate indices and target id ranges before dereferencing.
  • Updated TreeRegressor/TreeEnsembler CPU tests to use in-range target_ids / leaf_targetids.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h Adds extra index/range enforcement when mapping leaf weights to targets/classes.
onnxruntime/test/providers/cpu/ml/treeregressor_test.cc Adjusts test data to keep target_ids within the allowed target range.
onnxruntime/test/providers/cpu/ml/tree_ensembler_test.cc Adjusts test data to keep leaf_targetids within the allowed target range.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 16, 2026

@xadupre I've opened a new pull request, #27678, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 16, 2026

@xadupre I've opened a new pull request, #27679, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@yuslepukhin
Copy link
Copy Markdown
Member

There are suggestions to consider and CI to fix.

xadupre and others added 2 commits March 17, 2026 14:21
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@xadupre
Copy link
Copy Markdown
Member Author

xadupre commented Mar 17, 2026

There are suggestions to consider and CI to fix.

done

yuslepukhin
yuslepukhin previously approved these changes Mar 18, 2026
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, resolve the ICM ticket accordingly.

@yuslepukhin yuslepukhin dismissed their stale review March 19, 2026 20:56

revoking review

@yuslepukhin
Copy link
Copy Markdown
Member

Please, resolve comments for tests

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens CPU ML TreeEnsemble tree-building against malformed ONNX model attributes by adding additional runtime validations (primarily via ORT_ENFORCE) and adjusting existing tests to comply with the stricter checks.

Changes:

  • Add bounds/range validation for target_ids/weights mapping when building leaf weight structures.
  • Enforce non-negative nodes_featureids for non-leaf nodes (while normalizing leaf feature ids).
  • Update/add unit tests to reflect and validate the new failure behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/test/providers/cpu/ml/treeregressor_test.cc Fixes an existing test’s invalid target_ids for n_targets=1 and adds new negative tests covering out-of-range target_ids and negative featureids.
onnxruntime/test/providers/cpu/ml/tree_ensembler_test.cc Fixes leaf_targetids in n_targets=1 coverage to conform to new validations.
onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h Adds new ORT_ENFORCE validations during tree construction (target id range, feature id validity) and moves n_targets_or_classes_ initialization earlier.
onnxruntime/core/providers/cpu/ml/tree_ensemble_attribute.h Adds validation of base_values / base_values_as_tensor sizes vs n_targets_or_classes for regressor/classifier attribute parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@xadupre
Copy link
Copy Markdown
Member Author

xadupre commented Mar 26, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

…perators (#27678)

### Description

Adds informative error messages to the three `ORT_ENFORCE` index-bounds
checks in `tree_ensemble_common.h` that validate `target_class_ids`,
`target_class_weights`, and `target_class_weights_as_tensor` accesses.
Each message now surfaces the failing index `i` and the container size,
e.g.:

```
Index i=5 is out of bounds for target_class_ids (size=3)
Index i=5 is out of bounds for target_class_weights (size=3)
Index i=5 is out of bounds for target_class_weights_as_tensor (size=3)
```

### Motivation and Context

The original bare `ORT_ENFORCE(i < ...)` calls produced opaque failures
with no indication of which attribute was malformed or what the actual
index/size values were. This made it difficult to diagnose invalid or
adversarial models. Addresses feedback from
[#27677](#27677 (comment)).

<!-- START COPILOT CODING AGENT TIPS -->
---

🔒 GitHub Advanced Security automatically protects Copilot coding agent
pull requests. You can protect all pull requests by enabling Advanced
Security for your repositories. [Learn more about Advanced
Security.](https://gh.io/cca-advanced-security)

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Copilot AI and others added 2 commits March 26, 2026 17:22
…teration (#27679)

### Description

Addresses review feedback on #27677. The three per-iteration
`ORT_ENFORCE` bounds checks inside the weight assignment loop in
`tree_ensemble_common.h` have been moved to a single pre-loop validation
block.

**Before:** on every iteration, the loop checked `i <
target_class_ids.size()`, `i < target_class_weights.size()`, and `i <
target_class_weights_as_tensor.size()`.

**After:** since `indices` is built from `target_class_nodeids` with
consecutive indices `[0, limit)`, `i = indices[indi].second` is always
in range — so size relationships are validated once upfront with `==
limit` equality checks:

```cpp
limit = attributes.target_class_nodeids.size();
ORT_ENFORCE(attributes.target_class_ids.size() == limit,
            "target_class_ids size (", attributes.target_class_ids.size(),
            ") must match target_class_nodeids size (", limit, ")");
ORT_ENFORCE(attributes.target_class_weights_as_tensor.empty() ||
                attributes.target_class_weights_as_tensor.size() == limit, ...);
ORT_ENFORCE(!attributes.target_class_weights_as_tensor.empty() ||
                attributes.target_class_weights.size() == limit, ...);
for (indi = 0; indi < limit; ++indi) {
    // no per-element bounds checks needed
}
```

### Motivation and Context

The per-iteration checks made the control flow harder to reason about
and repeated redundant work on every loop iteration. Validating the
authoritative array lengths once — before the loop — makes it clear
which arrays must be parallel and eliminates repeated checks.

Original PR: #27677

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants