Refactor TensorRT EP code to better handle dynamic shape subgraphs #4504

stevenlix · 2020-07-14T06:52:26Z

Build dynamic shape engine during runtime
TensorRT requires shape tensor input to be deterministic when engine is built, so the engine must be built at run-time (rather than compile phase) specifically for dynamic shape subgraphs. This PR moves all engine builds to compute() except static shape engines.
Rebuild TensorRT engine at run-time if the range of dynamic shape inputs (either shape tensor or execution tensor) changes
Even model's inputs are static, at subgraph level the inputs could still be dynamic and the engine for the subgraph may need to be re-build at run-time.
Allocate dummy buffer for empty shape tensor, which is required by TensorRT
Serialize TensorRT engine to save build time
In some cases it takes long time for TRT to build engine because of engine optimization. In this PR the engine of static shape subgraph is serialized/cached when it was first built, and will be de-serialized/loaded to save build time for new sessions. Dynamic shape case will be addressed later due to its extra complexity since the engine is created dynamically during run time. Thank @jywu-msft for drafting the code

jywu-msft · 2020-07-14T14:10:25Z

looks like there are some build errors on Windows TensorRT CI

jywu-msft · 2020-07-14T14:17:47Z

docs/execution_providers/TensorRT-ExecutionProvider.md

@@ -67,9 +67,13 @@ ORT_TENSORRT_MIN_SUBGRAPH_SIZE: minimum node size in a subgraph after partitioni

 ORT_TENSORRT_FP16_ENABLE: Enable FP16 mode in TensorRT

-By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000, min subgraph size = 1 and FP16 mode is disabled.
+ORT_TENSORRT_ENGINE_CACHE_ENABLE: Enable TensorRT engine caching


I think we need some more documentation on the engine caching. (why it needed, how does it work,
when would you use it, and what are some of the pitfalls and limitations.
examples might be
-if you enabled fp16 and serialized engines, you need to enable fp16 when deploying/running it.
-engines are built specifically for the underlying hardware and aren't portable.
-caveats about input shape changes.

Good point. I've added more explanations in the doc.

jywu-msft · 2020-07-15T04:24:53Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

+      }
+
+      cudaDeviceSynchronize();
+      for (const auto& binding_index : binding_buffers_to_freeup) {


are we leaking memory if the enqueueV2() fails above?

For FAIL status, would the session run quit or continue to try other EPs? For the latter, we may need to free buffer when FAIL occurs

the session doesn't get destroyed automatically.
can enqueue failure be intermittent? (can there be a success after failure?)
in any case, a user could issue Run() again, or do the fallback manually (create a new session), so
it does seem like it could cause a leak.

build engine in runtime for dynamic shape subgraphs

e23ca70

stevenlix requested a review from a team as a code owner July 14, 2020 06:52

stevenlix requested a review from jywu-msft July 14, 2020 06:53

stevenlix added 2 commits July 14, 2020 00:02

Update TensorRT-ExecutionProvider.md

195b6ab

Update TensorRT-ExecutionProvider.md

46d1298

jywu-msft reviewed Jul 14, 2020

View reviewed changes

stevenlix and others added 4 commits July 14, 2020 12:14

fix build issue

b470ddb

Add more instructions on how to use engine caching

95041ea

add precision to trt node name

9a52a5d

Merge branch 'master' into stevenlix/trt70

d715f0f

jywu-msft reviewed Jul 15, 2020

View reviewed changes

stevenlix added 2 commits July 15, 2020 00:06

Update tensorrt_execution_provider.cc

3787d0a

Update tensorrt_execution_provider.cc

1fd3874

jywu-msft approved these changes Jul 15, 2020

View reviewed changes

jywu-msft merged commit 0ebe2fa into master Jul 15, 2020

jywu-msft deleted the stevenlix/trt70 branch July 15, 2020 09:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor TensorRT EP code to better handle dynamic shape subgraphs #4504

Refactor TensorRT EP code to better handle dynamic shape subgraphs #4504

Uh oh!

stevenlix commented Jul 14, 2020 •

edited

Loading

Uh oh!

jywu-msft commented Jul 14, 2020

Uh oh!

jywu-msft Jul 14, 2020

Uh oh!

stevenlix Jul 14, 2020

Uh oh!

jywu-msft Jul 15, 2020

Uh oh!

stevenlix Jul 15, 2020

Uh oh!

jywu-msft Jul 15, 2020

Uh oh!

Uh oh!

Refactor TensorRT EP code to better handle dynamic shape subgraphs #4504

Refactor TensorRT EP code to better handle dynamic shape subgraphs #4504

Uh oh!

Conversation

stevenlix commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jywu-msft commented Jul 14, 2020

Uh oh!

jywu-msft Jul 14, 2020

Choose a reason for hiding this comment

Uh oh!

stevenlix Jul 14, 2020

Choose a reason for hiding this comment

Uh oh!

jywu-msft Jul 15, 2020

Choose a reason for hiding this comment

Uh oh!

stevenlix Jul 15, 2020

Choose a reason for hiding this comment

Uh oh!

jywu-msft Jul 15, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stevenlix commented Jul 14, 2020 •

edited

Loading