Skip to content

Commit

Permalink
Upgrade TensorRT to version 7.0.0.11 (microsoft#2973)
Browse files Browse the repository at this point in the history
* update onnx-tensorrt submodule to trt7 branch

* add fp16 option for TRT7

* switch to master branch of onnx tensorrt

* update submodule

* update to TensorRT7.0.0.11

* update to onnx-tensorrt for TensorRT7.0

* switch to private branch due to issues in master branch

* remove trt_onnxify

* disable warnings c4804 for TensorRT parser

* disable warnings c4702 for TensorRT parser

* add back sanity check of shape tensort input in the parser

* disable some warnings for TensorRT7

* change fp16 threshold for TensorRT

* update onn-tensorrt parser

* fix cycle issue in faster-rcnn and add cycle detection in GetCapability

* Update TensorRT container to v20.01

* Update TensorRT image name

* Update linux-multi-gpu-tensorrt-ci-pipeline.yml

* Update linux-gpu-tensorrt-ci-pipeline.yml

* disable rnn tests for TensorRT

* disable rnn tests for TensorRT

* disabled some unit test for TensorRT

* update onnx-tensorrt submodule

* update build scripts for TensorRT

* formating the code

* Update TensorRT-ExecutionProvider.md

* Update BUILD.md

* Update tensorrt_execution_provider.h

* Update tensorrt_execution_provider.cc

* Update win-gpu-tensorrt-ci-pipeline.yml

* use GetEnvironmentVar function to get env virables and switch to Win-GPU-2019 agent pool for win CI build

* change tensorrt path

* change tensorrt path

* fix win ci build issue

* update code based on the reviews

* fix build issue

* roll back to cuda10.0

* add RemoveCycleTest for TensorRT

* fix windows ci build issues

* fix ci build issues

* fix file permission

* fix out of range issue for max_workspace_size_env
  • Loading branch information
stevenlix authored Feb 12, 2020
1 parent 273868e commit da653cc
Show file tree
Hide file tree
Showing 23 changed files with 452 additions and 176 deletions.
6 changes: 3 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,6 @@
[submodule "cmake/external/wil"]
path = cmake/external/wil
url = https://github.com/microsoft/wil
[submodule "cmake/external/onnx-tensorrt"]
path = cmake/external/onnx-tensorrt
url = https://github.com/onnx/onnx-tensorrt.git
[submodule "cmake/external/json"]
path = cmake/external/json
url = https://github.com/nlohmann/json
Expand All @@ -49,3 +46,6 @@
[submodule "cmake/external/FeaturizersLibrary"]
path = cmake/external/FeaturizersLibrary
url = https://github.com/microsoft/FeaturizersLibrary.git
[submodule "cmake/external/onnx-tensorrt"]
path = cmake/external/onnx-tensorrt
url = https://github.com/stevenlix/onnx-tensorrt.git
4 changes: 2 additions & 2 deletions BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,12 @@ See more information on the TensorRT Execution Provider [here](./docs/execution_
#### Pre-Requisites
* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn)
* The TensorRT execution provider for ONNX Runtime is built and tested with CUDA 10.1 and cuDNN 7.6.
* The TensorRT execution provider for ONNX Runtime is built and tested with CUDA 10.2 and cuDNN 7.6.5.
* The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home parameter`. The CUDA path should contain `bin`, `include` and `lib` directories.
* The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found.
* The path to the cuDNN installation (path to folder that contains libcudnn.so) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home parameter`.
* Install [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download)
* The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 6.0.1.5.
* The TensorRT execution provider for ONNX Runtime is built on TensorRT 7.x and is tested with TensorRT 7.0.0.11.
* The path to TensorRT installation must be provided via the `--tensorrt_home parameter`.
#### Build Instructions
Expand Down
2 changes: 1 addition & 1 deletion cmake/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -744,7 +744,7 @@ endif()

if (onnxruntime_USE_TENSORRT)
if (WIN32)
set(onnxruntime_DELAYLOAD_FLAGS "${onnxruntime_DELAYLOAD_FLAGS} /DELAYLOAD:nvinfer.dll")
set(onnxruntime_DELAYLOAD_FLAGS "${onnxruntime_DELAYLOAD_FLAGS} /DELAYLOAD:nvinfer.dll /DELAYLOAD:nvinfer_plugin.dll")
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-deprecated-declarations")
endif()
Expand Down
5 changes: 1 addition & 4 deletions cmake/onnxruntime_providers.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ if (onnxruntime_USE_TENSORRT)
set(OLD_CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
if (WIN32)
set(OLD_CMAKE_CUDA_FLAGS ${CMAKE_CUDA_FLAGS})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4996 /wd4244 /wd4267 /wd4099 /wd4551 /wd4505 /wd4515 /wd4706 /wd4456 /wd4324 /wd4701")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4996 /wd4244 /wd4267 /wd4099 /wd4551 /wd4505 /wd4515 /wd4706 /wd4456 /wd4324 /wd4701 /wd4804 /wd4702")
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4805")
endif()
Expand All @@ -228,15 +228,12 @@ if (onnxruntime_USE_TENSORRT)
unset(OLD_CMAKE_CXX_FLAGS)
unset(OLD_CMAKE_CUDA_FLAGS)
set_target_properties(nvonnxparser PROPERTIES LINK_FLAGS "/ignore:4199")
set_target_properties(trt_onnxify PROPERTIES LINK_FLAGS "/ignore:4199")
target_compile_definitions(trt_onnxify PRIVATE ONNXIFI_BUILD_LIBRARY=1)
target_sources(onnx2trt PRIVATE ${ONNXRUNTIME_ROOT}/test/win_getopt/mb/getopt.cc)
target_sources(getSupportedAPITest PRIVATE ${ONNXRUNTIME_ROOT}/test/win_getopt/mb/getopt.cc)
target_include_directories(onnx2trt PRIVATE ${ONNXRUNTIME_ROOT}/test/win_getopt/mb/include)
target_include_directories(getSupportedAPITest PRIVATE ${ONNXRUNTIME_ROOT}/test/win_getopt/mb/include)
target_compile_options(nvonnxparser_static PRIVATE /FIio.h /wd4100)
target_compile_options(nvonnxparser PRIVATE /FIio.h /wd4100)
target_compile_options(trt_onnxify PRIVATE /FIio.h /wd4100)
target_compile_options(onnx2trt PRIVATE /FIio.h /wd4100)
target_compile_options(getSupportedAPITest PRIVATE /FIio.h /wd4100)
endif()
Expand Down
11 changes: 8 additions & 3 deletions docs/execution_providers/TensorRT-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,19 @@ For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tun
When/if using [onnxruntime_perf_test](../../onnxruntime/test/perftest#onnxruntime-performance-test), use the flag `-e tensorrt`

## Configuring environment variables
There are three environment variables for TensorRT execution provider.
There are four environment variables for TensorRT execution provider.

ORT_TENSORRT_MAX_WORKSPACE_SIZE: maximum workspace size for TensorRT engine.

ORT_TENSORRT_MAX_PARTITION_ITERATIONS: maximum number of iterations allowed in model partitioning for TensorRT. If target model can't be successfully partitioned when the maximum number of iterations is reached, the whole model will fall back to other execution providers such as CUDA or CPU.

ORT_TENSORRT_MIN_SUBGRAPH_SIZE: minimum node size in a subgraph after partitioning. Subgraphs with smaller size will fall back to other execution providers.

By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000 and min subgraph size = 1.
ORT_TENSORRT_FP16_ENABLE: Enable FP16 mode in TensorRT

One can override these defaults by setting environment variables ORT_TENSORRT_MAX_WORKSPACE_SIZE, ORT_TENSORRT_MAX_PARTITION_ITERATIONS and ORT_TENSORRT_MIN_SUBGRAPH_SIZE.
By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000, min subgraph size = 1 and FP16 mode is disabled.

One can override these defaults by setting environment variables ORT_TENSORRT_MAX_WORKSPACE_SIZE, ORT_TENSORRT_MAX_PARTITION_ITERATIONS, ORT_TENSORRT_MIN_SUBGRAPH_SIZE and ORT_TENSORRT_FP16_ENABLE.
e.g. on Linux

### override default max workspace size to 2GB
Expand All @@ -67,3 +69,6 @@ export ORT_TENSORRT_MAX_PARTITION_ITERATIONS=10

### override default minimum subgraph node size to 5
export ORT_TENSORRT_MIN_SUBGRAPH_SIZE=5

### Enable FP16 mode in TensorRT
export ORT_TENSORRT_FP16_ENABLE=1
Loading

0 comments on commit da653cc

Please sign in to comment.