Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TensorRT EP code to better handle dynamic shape subgraphs #4504

Merged
merged 9 commits into from
Jul 15, 2020
Merged
4 changes: 4 additions & 0 deletions cmake/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -914,6 +914,10 @@ if (onnxruntime_USE_TENSORRT)
set(onnxruntime_DELAYLOAD_FLAGS "${onnxruntime_DELAYLOAD_FLAGS} /DELAYLOAD:nvinfer.dll /DELAYLOAD:nvinfer_plugin.dll")
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-deprecated-declarations")
# needs to link with stdc++fs in Linux
if (NOT APPLE)
list(APPEND onnxruntime_EXTERNAL_LIBRARIES stdc++fs)
endif()
endif()
endif()

Expand Down
1 change: 1 addition & 0 deletions cmake/onnxruntime_providers.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,7 @@ if (onnxruntime_USE_TENSORRT)
include_directories(${ONNXRUNTIME_ROOT}/../cmake/external/onnx)
set(OLD_CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
if (WIN32)
add_definitions(-D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1)
set(OLD_CMAKE_CUDA_FLAGS ${CMAKE_CUDA_FLAGS})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4996 /wd4244 /wd4267 /wd4099 /wd4551 /wd4505 /wd4515 /wd4706 /wd4456 /wd4324 /wd4701 /wd4804 /wd4702")
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
Expand Down
14 changes: 12 additions & 2 deletions docs/execution_providers/TensorRT-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,13 @@ ORT_TENSORRT_MIN_SUBGRAPH_SIZE: minimum node size in a subgraph after partitioni

ORT_TENSORRT_FP16_ENABLE: Enable FP16 mode in TensorRT

By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000, min subgraph size = 1 and FP16 mode is disabled.
ORT_TENSORRT_ENGINE_CACHE_ENABLE: Enable TensorRT engine caching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some more documentation on the engine caching. (why it needed, how does it work,
when would you use it, and what are some of the pitfalls and limitations.
examples might be
-if you enabled fp16 and serialized engines, you need to enable fp16 when deploying/running it.
-engines are built specifically for the underlying hardware and aren't portable.
-caveats about input shape changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've added more explanations in the doc.


One can override these defaults by setting environment variables ORT_TENSORRT_MAX_WORKSPACE_SIZE, ORT_TENSORRT_MAX_PARTITION_ITERATIONS, ORT_TENSORRT_MIN_SUBGRAPH_SIZE and ORT_TENSORRT_FP16_ENABLE.
ORT_TENSORRT_ENGINE_CACHE_PATH: Specify path for TensorRT engine files if ORT_TENSORRT_ENGINE_CACHE_ENABLE is 1

By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000, min subgraph size = 1, FP16 mode is disabled and TensorRT engine caching is disabled.

One can override these defaults by setting environment variables ORT_TENSORRT_MAX_WORKSPACE_SIZE, ORT_TENSORRT_MAX_PARTITION_ITERATIONS, ORT_TENSORRT_MIN_SUBGRAPH_SIZE, ORT_TENSORRT_FP16_ENABLE, ORT_TENSORRT_ENGINE_CACHE_ENABLE and ORT_TENSORRT_ENGINE_CACHE_PATH.
e.g. on Linux

### override default max workspace size to 2GB
Expand All @@ -83,3 +87,9 @@ export ORT_TENSORRT_MIN_SUBGRAPH_SIZE=5

### Enable FP16 mode in TensorRT
export ORT_TENSORRT_FP16_ENABLE=1

### Enable TensorRT engine caching
export ORT_TENSORRT_ENGINE_CACHE_ENABLE=1

### Specify TensorRT engine cache path
export ORT_TENSORRT_ENGINE_CACHE_PATH="cache"
Loading