-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix TRT custom op issue #12283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TRT custom op issue #12283
Conversation
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline,MacOS CI Pipeline,ONNX Runtime Web CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline |
Azure Pipelines successfully started running 10 pipeline(s). |
/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,onnxruntime-python-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 6 pipeline(s). |
ORT_MINIMAL_BUILD failed. |
@stevenlix , |
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline,MacOS CI Pipeline,ONNX Runtime Web CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline |
/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,onnxruntime-python-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 10 pipeline(s). |
Azure Pipelines successfully started running 6 pipeline(s). |
@@ -743,7 +743,11 @@ struct ProviderHostImpl : ProviderHost { | |||
void GraphViewer__operator_delete(GraphViewer* p) override { delete p; } | |||
std::unique_ptr<Model> GraphViewer__CreateModel(const GraphViewer* graph_viewer, const logging::Logger& logger) override { | |||
return std::make_unique<Model>(graph_viewer->Name(), true, ModelMetaData(), PathString(), | |||
#if !defined(ORT_MINIMAL_BUILD) | |||
IOnnxRuntimeOpSchemaRegistryList({graph_viewer->GetSchemaRegistry()}), graph_viewer->DomainToVersionMap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove extra leading spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many spaces should I keep? These are function arguments and is aligned with the args Ln745.
@@ -743,7 +743,11 @@ struct ProviderHostImpl : ProviderHost { | |||
void GraphViewer__operator_delete(GraphViewer* p) override { delete p; } | |||
std::unique_ptr<Model> GraphViewer__CreateModel(const GraphViewer* graph_viewer, const logging::Logger& logger) override { | |||
return std::make_unique<Model>(graph_viewer->Name(), true, ModelMetaData(), PathString(), | |||
#if !defined(ORT_MINIMAL_BUILD) | |||
IOnnxRuntimeOpSchemaRegistryList({graph_viewer->GetSchemaRegistry()}), graph_viewer->DomainToVersionMap(), | |||
#else | |||
IOnnxRuntimeOpSchemaRegistryList(), graph_viewer->DomainToVersionMap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove extra leading spaces
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline,MacOS CI Pipeline,ONNX Runtime Web CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline |
/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,onnxruntime-python-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 10 pipeline(s). |
Azure Pipelines successfully started running 6 pipeline(s). |
* Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue.
* update package version * Prevent unbounded growth of command allocator memory (#12114) Prevent unbounded growth of command allocator memory * Update supported ops md for NNAPI/CoreML EP (#12245) * update supported ops md * address pr comments * address pr comments * wording * Change native folder name for java macos arm64 (#12335) * Bump async from 2.6.3 to 2.6.4 in /js/react_native/e2e (#11280) Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4. - [Release notes](https://github.com/caolan/async/releases) - [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md) - [Commits](caolan/async@v2.6.3...v2.6.4) --- updated-dependencies: - dependency-name: async dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [js/rn] upgrade dependencies for e2e test (#11863) * [js/rn] upgrade dependencies for e2e test * use JDK11 only for gradle * expand variable * [js/rn] upgrade package react-native@^0.69.1 (#12155) * [js/rn] upgrade package react-native@^0.69.1 * upgrade compile sdk to v31 * update ios version requirement * update pod path for onnxruntime-react-native * add missing build_java in Android testing stage. (#12187) add missing build_java in testing * Use specific Android NDK version in CI builds. (#12350) Current builds use a NDK version that happens to be on the build machine. The build machine environment may change in ways that are outside of our control. This change installs a specific version of NDK (the current LTS version 25.0.8775105) and uses it. * Remove preview keyword from DirectML pacakge (#12368) Remove preview keyword Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> * Scope CreateFileMapping2 to valid API partitions (#12374) * Fix TRT custom op issue (#12283) * Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue. * Manually add optimization flag for Android Release builds. (#12390) With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration. More details here: android/ndk#1740 Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21. This change is a workaround to manually add `-O3` for "Release" Android builds. * resolve conflicts in tensorRT related changes * Enable support of multi-level nested control flow ops model for TRT EP (#12147) * Make multiple-level nested control flow op model work * find correct input index * find correct input index (cont.) * enable nested layer unit tests for TRT EP * add comment * add Scan op to current workaround support of control flow op Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com> Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: sumitsays <sumitagarwal330@gmail.com> Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> Co-authored-by: Justin Stoecker <justoeck@microsoft.com> Co-authored-by: Yateng Hong <yatengh@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
* Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue.
Description:
Fix issue #12282 : TRT EP failed to create model session with CUDA custom op.
Motivation and Context
Why is this change required? What problem does it solve?
When executing a model with customop, ORT crashes at TRT EP's
GetSupportedList
. Inside this function, it will create a graph builder first, then construct subgraphs.onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
Lines 721 to 722 in f2533d3
However, during graph viewer model creation, it lost the schema registries information for custom ops. Thus, causing exceptions in the line below:
onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
Line 763 in f2533d3
The fix is to pass schema registries inside
GraphViewer__CreateModel
.If it fixes an open issue, please link to the issue here.
TRT EP failed to create model session with CUDA custom op #12282