[TensorRT] Fix DDS output bug during engine update#26272
[TensorRT] Fix DDS output bug during engine update#26272tianleiwu merged 3 commits intomicrosoft:mainfrom
Conversation
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a bug in the TensorRT Execution Provider where DDS (Direct Data Structure) output tensors were not properly bound after an engine update, causing execution failures during dynamic shape inference scenarios.
- Clears the
dds_output_allocator_mapwhen the TensorRT engine is recreated to prevent stale mappings - Ensures proper output tensor binding during engine updates with different input shapes
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Thanks for fixing this issue. |
Where should I put the repo script? I didn't find a dedicate python test script for tensorrt EP. |
Added C++ test case for TensorrtExecutionProviderTest.DDSOutputTest |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.
### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.
Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime
def create_model():
import onnx
from onnx import helper, TensorProto
input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])
nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
transpose_node = helper.make_node(
"Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
)
gathernd_node = helper.make_node(
"GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
)
value_info = [
helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
]
graph = helper.make_graph(
[nonzeros_node, transpose_node, gathernd_node],
"test_graph",
[input],
[output],
value_info=value_info,
)
model = helper.make_model(graph)
onnx.save(model, "model_dds.onnx")
def run_model():
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])
print("Running with data shape (3,4)")
data = np.random.randn(3, 4).astype(np.float32)
sess.run(None, {"data": data})
print("Running with data shape (5,6)")
data = np.random.randn(5, 6).astype(np.float32)
sess.run(None, {"data": data})
create_model()
run_model()
```
Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.
### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.
### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.
Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime
def create_model():
import onnx
from onnx import helper, TensorProto
input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])
nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
transpose_node = helper.make_node(
"Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
)
gathernd_node = helper.make_node(
"GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
)
value_info = [
helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
]
graph = helper.make_graph(
[nonzeros_node, transpose_node, gathernd_node],
"test_graph",
[input],
[output],
value_info=value_info,
)
model = helper.make_model(graph)
onnx.save(model, "model_dds.onnx")
def run_model():
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])
print("Running with data shape (3,4)")
data = np.random.randn(3, 4).astype(np.float32)
sess.run(None, {"data": data})
print("Running with data shape (5,6)")
data = np.random.randn(5, 6).astype(np.float32)
sess.run(None, {"data": data})
create_model()
run_model()
```
Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Yateng Hong <toothache9010@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <31260809+yifei410@users.noreply.github.com> Co-authored-by: yifei <y.zhou@xilinx.com>
|
Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag |
### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.
### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.
Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime
def create_model():
import onnx
from onnx import helper, TensorProto
input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])
nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
transpose_node = helper.make_node(
"Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
)
gathernd_node = helper.make_node(
"GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
)
value_info = [
helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
]
graph = helper.make_graph(
[nonzeros_node, transpose_node, gathernd_node],
"test_graph",
[input],
[output],
value_info=value_info,
)
model = helper.make_model(graph)
onnx.save(model, "model_dds.onnx")
def run_model():
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])
print("Running with data shape (3,4)")
data = np.random.randn(3, 4).astype(np.float32)
sess.run(None, {"data": data})
print("Running with data shape (5,6)")
data = np.random.randn(5, 6).astype(np.float32)
sess.run(None, {"data": data})
create_model()
run_model()
```
Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.
### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.
### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.
Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime
def create_model():
import onnx
from onnx import helper, TensorProto
input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])
nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
transpose_node = helper.make_node(
"Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
)
gathernd_node = helper.make_node(
"GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
)
value_info = [
helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
]
graph = helper.make_graph(
[nonzeros_node, transpose_node, gathernd_node],
"test_graph",
[input],
[output],
value_info=value_info,
)
model = helper.make_model(graph)
onnx.save(model, "model_dds.onnx")
def run_model():
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])
print("Running with data shape (3,4)")
data = np.random.randn(3, 4).astype(np.float32)
sess.run(None, {"data": data})
print("Running with data shape (5,6)")
data = np.random.randn(5, 6).astype(np.float32)
sess.run(None, {"data": data})
create_model()
run_model()
```
Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.
Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was not bound after an engine update.
Motivation and Context
The
dds_output_allocator_mapis not cleared on engine update, so that it will mis-recognized as a known DDS and will not bind the output allocation.Script to reproduce the issue:
Before the change: