add more ORT_ENFORCE when building the tree structure of TreeEnsemble operators#27677
add more ORT_ENFORCE when building the tree structure of TreeEnsemble operators#27677
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens TreeEnsemble initialization by adding ORT_ENFORCE bounds checks around target/weight indexing to prevent out-of-bounds access when models have not been pre-validated, and adjusts a couple of ML operator tests accordingly.
Changes:
- Added additional
ORT_ENFORCEchecks inTreeEnsembleCommon::Initto validate indices and target id ranges before dereferencing. - Updated TreeRegressor/TreeEnsembler CPU tests to use in-range
target_ids/leaf_targetids.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h | Adds extra index/range enforcement when mapping leaf weights to targets/classes. |
| onnxruntime/test/providers/cpu/ml/treeregressor_test.cc | Adjusts test data to keep target_ids within the allowed target range. |
| onnxruntime/test/providers/cpu/ml/tree_ensembler_test.cc | Adjusts test data to keep leaf_targetids within the allowed target range. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
There are suggestions to consider and CI to fix. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
done |
yuslepukhin
left a comment
There was a problem hiding this comment.
Please, resolve the ICM ticket accordingly.
|
Please, resolve comments for tests |
…xadupre/tree110413
…xadupre/tree110413
…xadupre/tree110413
There was a problem hiding this comment.
Pull request overview
This PR hardens CPU ML TreeEnsemble tree-building against malformed ONNX model attributes by adding additional runtime validations (primarily via ORT_ENFORCE) and adjusting existing tests to comply with the stricter checks.
Changes:
- Add bounds/range validation for
target_ids/weights mapping when building leaf weight structures. - Enforce non-negative
nodes_featureidsfor non-leaf nodes (while normalizing leaf feature ids). - Update/add unit tests to reflect and validate the new failure behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| onnxruntime/test/providers/cpu/ml/treeregressor_test.cc | Fixes an existing test’s invalid target_ids for n_targets=1 and adds new negative tests covering out-of-range target_ids and negative featureids. |
| onnxruntime/test/providers/cpu/ml/tree_ensembler_test.cc | Fixes leaf_targetids in n_targets=1 coverage to conform to new validations. |
| onnxruntime/core/providers/cpu/ml/tree_ensemble_common.h | Adds new ORT_ENFORCE validations during tree construction (target id range, feature id validity) and moves n_targets_or_classes_ initialization earlier. |
| onnxruntime/core/providers/cpu/ml/tree_ensemble_attribute.h | Adds validation of base_values / base_values_as_tensor sizes vs n_targets_or_classes for regressor/classifier attribute parsing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot open a new pull request to apply changes based on the comments in this thread |
…perators (#27678) ### Description Adds informative error messages to the three `ORT_ENFORCE` index-bounds checks in `tree_ensemble_common.h` that validate `target_class_ids`, `target_class_weights`, and `target_class_weights_as_tensor` accesses. Each message now surfaces the failing index `i` and the container size, e.g.: ``` Index i=5 is out of bounds for target_class_ids (size=3) Index i=5 is out of bounds for target_class_weights (size=3) Index i=5 is out of bounds for target_class_weights_as_tensor (size=3) ``` ### Motivation and Context The original bare `ORT_ENFORCE(i < ...)` calls produced opaque failures with no indication of which attribute was malformed or what the actual index/size values were. This made it difficult to diagnose invalid or adversarial models. Addresses feedback from [#27677](#27677 (comment)). <!-- START COPILOT CODING AGENT TIPS --> --- 🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
…teration (#27679) ### Description Addresses review feedback on #27677. The three per-iteration `ORT_ENFORCE` bounds checks inside the weight assignment loop in `tree_ensemble_common.h` have been moved to a single pre-loop validation block. **Before:** on every iteration, the loop checked `i < target_class_ids.size()`, `i < target_class_weights.size()`, and `i < target_class_weights_as_tensor.size()`. **After:** since `indices` is built from `target_class_nodeids` with consecutive indices `[0, limit)`, `i = indices[indi].second` is always in range — so size relationships are validated once upfront with `== limit` equality checks: ```cpp limit = attributes.target_class_nodeids.size(); ORT_ENFORCE(attributes.target_class_ids.size() == limit, "target_class_ids size (", attributes.target_class_ids.size(), ") must match target_class_nodeids size (", limit, ")"); ORT_ENFORCE(attributes.target_class_weights_as_tensor.empty() || attributes.target_class_weights_as_tensor.size() == limit, ...); ORT_ENFORCE(!attributes.target_class_weights_as_tensor.empty() || attributes.target_class_weights.size() == limit, ...); for (indi = 0; indi < limit; ++indi) { // no per-element bounds checks needed } ``` ### Motivation and Context The per-iteration checks made the control flow harder to reason about and repeated redundant work on every loop iteration. Validating the authoritative array lengths once — before the loop — makes it clear which arrays must be parallel and eliminates repeated checks. Original PR: #27677 <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com> Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
…xadupre/tree110413
…rators Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/3d879480-fc50-47de-806f-0cb316b902f5 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
…xruntime into xadupre/tree110413
Description
The building of the tree was implemented assuming the onnx models has been checked before.
This PR assumes it is not and verifies some indices are not outside boundaries.
The following validations were added or improved:
target_ids/weights mapping when building leaf weight structures, with actionable diagnostic messages including index and container sizes.target_class_ids,target_class_nodeids,target_class_treeids, weights) validated once before iteration rather than per-loop.nodes_featureidsenforced for non-leaf nodes.base_valuesandbase_values_as_tensorsizes validated againstn_targets_or_classesfor both regressors and classifiers, with correct attribute names in error messages.ORT_ENFORCEchecks: out-of-rangetarget_ids, negativefeatureids, and wrong-sizedbase_valuesfor regressors and classifiers (binary and multi-class).Motivation and Context
Security issues.