Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TensorRT EP code to better handle dynamic shape subgraphs #4504

Merged
merged 9 commits into from
Jul 15, 2020

Conversation

stevenlix
Copy link
Contributor

@stevenlix stevenlix commented Jul 14, 2020

  1. Build dynamic shape engine during runtime
    TensorRT requires shape tensor input to be deterministic when engine is built, so the engine must be built at run-time (rather than compile phase) specifically for dynamic shape subgraphs. This PR moves all engine builds to compute() except static shape engines.
  2. Rebuild TensorRT engine at run-time if the range of dynamic shape inputs (either shape tensor or execution tensor) changes
    Even model's inputs are static, at subgraph level the inputs could still be dynamic and the engine for the subgraph may need to be re-build at run-time.
  3. Allocate dummy buffer for empty shape tensor, which is required by TensorRT
  4. Serialize TensorRT engine to save build time
    In some cases it takes long time for TRT to build engine because of engine optimization. In this PR the engine of static shape subgraph is serialized/cached when it was first built, and will be de-serialized/loaded to save build time for new sessions. Dynamic shape case will be addressed later due to its extra complexity since the engine is created dynamically during run time. Thank @jywu-msft for drafting the code

@stevenlix stevenlix requested a review from a team as a code owner July 14, 2020 06:52
@stevenlix stevenlix requested a review from jywu-msft July 14, 2020 06:53
@jywu-msft
Copy link
Member

looks like there are some build errors on Windows TensorRT CI

@@ -67,9 +67,13 @@ ORT_TENSORRT_MIN_SUBGRAPH_SIZE: minimum node size in a subgraph after partitioni

ORT_TENSORRT_FP16_ENABLE: Enable FP16 mode in TensorRT

By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000, min subgraph size = 1 and FP16 mode is disabled.
ORT_TENSORRT_ENGINE_CACHE_ENABLE: Enable TensorRT engine caching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some more documentation on the engine caching. (why it needed, how does it work,
when would you use it, and what are some of the pitfalls and limitations.
examples might be
-if you enabled fp16 and serialized engines, you need to enable fp16 when deploying/running it.
-engines are built specifically for the underlying hardware and aren't portable.
-caveats about input shape changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've added more explanations in the doc.

}

cudaDeviceSynchronize();
for (const auto& binding_index : binding_buffers_to_freeup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we leaking memory if the enqueueV2() fails above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For FAIL status, would the session run quit or continue to try other EPs? For the latter, we may need to free buffer when FAIL occurs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the session doesn't get destroyed automatically.
can enqueue failure be intermittent? (can there be a success after failure?)
in any case, a user could issue Run() again, or do the fallback manually (create a new session), so
it does seem like it could cause a leak.

@jywu-msft jywu-msft merged commit 0ebe2fa into master Jul 15, 2020
@jywu-msft jywu-msft deleted the stevenlix/trt70 branch July 15, 2020 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants