[TensorRT] Fix DDS output bug during engine update by toothache · Pull Request #26272 · microsoft/onnxruntime

toothache · 2025-10-10T06:28:05Z

Description

Fix a bug in the TRT Execution Provider where the DDS output tensor was not bound after an engine update.

Motivation and Context

The dds_output_allocator_map is not cleared on engine update, so that it will mis-recognized as a known DDS and will not bind the output allocation.

Script to reproduce the issue:

# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime

def create_model():
    import onnx
    from onnx import helper, TensorProto

    input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
    output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])

    nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
    transpose_node = helper.make_node(
        "Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
    )
    gathernd_node = helper.make_node(
        "GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
    )

    value_info = [
        helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
        helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
    ]

    graph = helper.make_graph(
        [nonzeros_node, transpose_node, gathernd_node],
        "test_graph",
        [input],
        [output],
        value_info=value_info,
    )

    model = helper.make_model(graph)
    onnx.save(model, "model_dds.onnx")


def run_model():
    import onnxruntime as ort
    import numpy as np

    sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])

    print("Running with data shape (3,4)")
    data = np.random.randn(3, 4).astype(np.float32)
    sess.run(None, {"data": data})

    print("Running with data shape (5,6)")
    data = np.random.randn(5, 6).astype(np.float32)
    sess.run(None, {"data": data})


create_model()
run_model()

Before the change:

IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter check failed, condition: mContext.profileObliviousBindings.at(profileObliviousIndex) || getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address or allocator is set for output tensor scores. Call setOutputTensorAddress, setTensorAddress or setOutputAllocator before enqueue/execute.) ... Status Message: TensorRT EP execution context enqueue failed.

tianleiwu · 2025-10-10T17:31:24Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-10-10T17:31:41Z

Azure Pipelines successfully started running 4 pipeline(s).

Copilot

Pull Request Overview

This PR fixes a bug in the TensorRT Execution Provider where DDS (Direct Data Structure) output tensors were not properly bound after an engine update, causing execution failures during dynamic shape inference scenarios.

Clears the dds_output_allocator_map when the TensorRT engine is recreated to prevent stale mappings
Ensures proper output tensor binding during engine updates with different input shapes

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

chilo-ms · 2025-10-13T22:16:29Z

Thanks for fixing this issue.
Could you help also add a unit test that uses your repro script?

toothache · 2025-10-14T02:34:24Z

Thanks for fixing this issue. Could you help also add a unit test that uses your repro script?

Where should I put the repo script? I didn't find a dedicate python test script for tensorrt EP.

toothache · 2025-10-14T07:41:03Z

Thanks for fixing this issue. Could you help also add a unit test that uses your repro script?

Where should I put the repo script? I didn't find a dedicate python test script for tensorrt EP.

Added C++ test case for TensorrtExecutionProviderTest.DDSOutputTest

onnxruntime/test/testdata/ort_github_issue_26272.py

@@ -0,0 +1,26 @@
+import onnx


tianleiwu · 2025-10-14T17:17:27Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-10-14T17:17:45Z

Azure Pipelines successfully started running 4 pipeline(s).

### Description Fix a bug in the TRT Execution Provider where the DDS output tensor was not bound after an engine update. ### Motivation and Context The `dds_output_allocator_map` is not cleared on engine update, so that it will mis-recognized as a known DDS and will not bind the output allocation. Script to reproduce the issue: ```:python # create an onnx model with: # inputs: data -> NonZeros(data) -> GatherND -> output # then run the model with onnxruntime def create_model(): import onnx from onnx import helper, TensorProto input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"]) output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"]) nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node") transpose_node = helper.make_node( "Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node" ) gathernd_node = helper.make_node( "GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node" ) value_info = [ helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]), helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]), ] graph = helper.make_graph( [nonzeros_node, transpose_node, gathernd_node], "test_graph", [input], [output], value_info=value_info, ) model = helper.make_model(graph) onnx.save(model, "model_dds.onnx") def run_model(): import onnxruntime as ort import numpy as np sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]) print("Running with data shape (3,4)") data = np.random.randn(3, 4).astype(np.float32) sess.run(None, {"data": data}) print("Running with data shape (5,6)") data = np.random.randn(5, 6).astype(np.float32) sess.run(None, {"data": data}) create_model() run_model() ``` Before the change: > IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter check failed, condition: mContext.profileObliviousBindings.at(profileObliviousIndex) || getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address or allocator is set for output tensor scores. Call setOutputTensorAddress, setTensorAddress or setOutputAllocator before enqueue/execute.) ... Status Message: TensorRT EP execution context enqueue failed.

Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Yateng Hong <toothache9010@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <31260809+yifei410@users.noreply.github.com> Co-authored-by: yifei <y.zhou@xilinx.com>

apsonawane · 2025-10-21T23:44:29Z

Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag

### Description Fix a bug in the TRT Execution Provider where the DDS output tensor was not bound after an engine update. ### Motivation and Context The `dds_output_allocator_map` is not cleared on engine update, so that it will mis-recognized as a known DDS and will not bind the output allocation. Script to reproduce the issue: ```:python # create an onnx model with: # inputs: data -> NonZeros(data) -> GatherND -> output # then run the model with onnxruntime def create_model(): import onnx from onnx import helper, TensorProto input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"]) output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"]) nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node") transpose_node = helper.make_node( "Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node" ) gathernd_node = helper.make_node( "GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node" ) value_info = [ helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]), helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]), ] graph = helper.make_graph( [nonzeros_node, transpose_node, gathernd_node], "test_graph", [input], [output], value_info=value_info, ) model = helper.make_model(graph) onnx.save(model, "model_dds.onnx") def run_model(): import onnxruntime as ort import numpy as np sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]) print("Running with data shape (3,4)") data = np.random.randn(3, 4).astype(np.float32) sess.run(None, {"data": data}) print("Running with data shape (5,6)") data = np.random.randn(5, 6).astype(np.float32) sess.run(None, {"data": data}) create_model() run_model() ``` Before the change: > IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter check failed, condition: mContext.profileObliviousBindings.at(profileObliviousIndex) || getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address or allocator is set for output tensor scores. Call setOutputTensorAddress, setTensorAddress or setOutputAllocator before enqueue/execute.) ... Status Message: TensorRT EP execution context enqueue failed.

[TensorRT] Fix DDS output bug during engine update

805b855

tianleiwu requested review from chilo-ms and Copilot October 10, 2025 17:32

Copilot AI reviewed Oct 10, 2025

View reviewed changes

tianleiwu added the release:1.23.2 label Oct 14, 2025

Add test case.

052a0b4

github-advanced-security bot found potential problems Oct 14, 2025

View reviewed changes

onnxruntime/test/testdata/ort_github_issue_26272.py Fixed Show fixed Hide fixed

onnxruntime/test/testdata/ort_github_issue_26272.py Fixed Show fixed Hide fixed

onnxruntime/test/testdata/ort_github_issue_26272.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Oct 14, 2025

View reviewed changes

onnxruntime/test/testdata/ort_github_issue_26272.py Fixed Show fixed Hide fixed

Apply lint.

00d874e

toothache closed this Oct 14, 2025

toothache reopened this Oct 14, 2025

github-advanced-security bot found potential problems Oct 14, 2025

View reviewed changes

tianleiwu approved these changes Oct 14, 2025

View reviewed changes

tianleiwu merged commit 654137f into microsoft:main Oct 14, 2025
132 of 168 checks passed

apsonawane mentioned this pull request Oct 17, 2025

ORT 1.23.2 cherrypick 1 #26347

Closed

apsonawane mentioned this pull request Oct 20, 2025

ORT 1.23.2 cherrypick 1 #26368

Merged

apsonawane added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.2 labels Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT] Fix DDS output bug during engine update#26272

[TensorRT] Fix DDS output bug during engine update#26272
tianleiwu merged 3 commits intomicrosoft:mainfrom
toothache:fix_dds_out

toothache commented Oct 10, 2025 •

edited

Loading

Uh oh!

tianleiwu commented Oct 10, 2025

Uh oh!

azure-pipelines bot commented Oct 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

chilo-ms commented Oct 13, 2025

Uh oh!

toothache commented Oct 14, 2025

Uh oh!

toothache commented Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check notice

tianleiwu commented Oct 14, 2025

Uh oh!

azure-pipelines bot commented Oct 14, 2025

Uh oh!

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

toothache commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

tianleiwu commented Oct 10, 2025

Uh oh!

azure-pipelines bot commented Oct 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

chilo-ms commented Oct 13, 2025

Uh oh!

toothache commented Oct 14, 2025

Uh oh!

toothache commented Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check notice

tianleiwu commented Oct 14, 2025

Uh oh!

azure-pipelines bot commented Oct 14, 2025

Uh oh!

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

toothache commented Oct 10, 2025 •

edited

Loading