Add multithreading test and put a lock on nvinfer1::createInferRuntime() for TRT EP #10714

chilo-ms · 2022-03-01T20:00:51Z

Added multithreading test as well as put a lock around nvinfer1::createInferRuntime() call.
Without the lock, the test has the chances to encounter segfault or fail due to race conditions.

chilo-ms · 2022-03-01T20:04:39Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

+  static OrtMutex singleton;
+
+  // Acquire a lock only when force_sequential_engine_build_ is true;
+  return force_sequential_engine_build_ ? std::unique_lock<OrtMutex>(singleton) : std::unique_lock<OrtMutex>();


I was wondering whether to change to use unique_lock all the time? Since build engine API is not thread safe.
(Even though current multithreading test doesn't fail here, it's much safer to put lock all the time)

yes, i think we should always lock when creating engine. at the very least, we don't want multiple threads building at the same time as it can perturb the tactics timings which can impact performance.
btw, i noticed the buildEngineWithConfig() api is deprecated starting TRT 8.0
it's replaced by buildSerializedNetwork()
at some point, we need to make a pass through trt api calls and update any deprecated api calls. +@stevenlix fyi

jywu-msft · 2022-03-01T20:21:00Z

onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc

+  CreateBaseModel(model_name, graph_name, dims);
+
+  for (int i = 0; i < num_thread; ++i)
+    threads.push_back(std::thread(RunInference, model_name, sess_log_id));


each thread is creating AND running the inference.
we also need to test the case where you create a single session and have multiple threads run inference for same session.

single session and multi-threads inference case is added.

jywu-msft · 2022-03-01T20:22:40Z

onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc

@@ -87,6 +88,97 @@ void CreateBaseModel(std::string model_name, std::string graph_name, std::vector
  status = onnxruntime::Model::Save(model, model_name);
 }

+void RunInference(std::string model_name, std::string sess_log_id) {


i think we need to clarify with the naming that each thread is creating its own session and only 1 thread is running inference per session.
we need a multiple run, single session test as well.

Rename the function.

jywu-msft · 2022-03-04T03:11:15Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.h

-  Otherwise, the constructed unique_lock is not associated with any mutex therefore no locking/unlocking will happen.
+  Get a unique_lock object to control the concurrency behavior. 
+  Every api call not in the thread-safe operations(https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#threading)
+  shoud be protected by a lock for multi-threading situation.


typo: shoud
rewrite multi-threading situation to 'when invoked by multiple threads concurrently'.

jywu-msft · 2022-03-04T03:11:52Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.h

  */
-  std::unique_lock<OrtMutex> GetEngineBuildLock() const;
+  std::unique_lock<OrtMutex> GetLock() const;


GetLock is a bit too generic a name? maybe GetApiLock() or something like that, for enforcing serial ordering of tensorrt operation.

jywu-msft · 2022-03-04T03:14:41Z

onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc

@@ -87,6 +88,190 @@ void CreateBaseModel(std::string model_name, std::string graph_name, std::vector
  status = onnxruntime::Model::Save(model, model_name);
 }

+void RunSession(InferenceSession& session_object,


we might consider in the future to migrate these to inference_session_test ?
these tests are basically generic and could be applied to all providers.
probably don't do it in this PR before testing whether all EP's can pass these tests.

jywu-msft · 2022-03-04T17:37:13Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

@@ -250,6 +250,11 @@ TensorrtLogger& GetTensorrtLogger() {
  return trt_logger;
 }

+std::unique_lock<OrtMutex> TensorrtExecutionProvider::GetApiLock() const {


we need to be careful if we expand usage of this beyond the current few locking calls since it's singleton and shared across all sessions.

If we keep the lock on all the time, we should remove the TRT option force_sequential_engine_build, which provided an option to turn the lock off before.

be careful when removing options from structs and api's though.

legacy OrtTensorRTProviderOptions struct: I don't think we can remove this field from the struct and build ORT directly. User application built with the old struct may fail to access force_sequential_engine_build field. We can disable this field instead?
new OrtTensorRTProviderOptionsV2: I think we can remove this field since it's a opaque struct. Probably need to provide warning information when user set force_sequential_engine_build through CreateTensorRTProviderOptions().

the option has become a no-op , so it's not as critical I think
we should remove it from our documentation though

jywu-msft · 2022-03-04T17:38:07Z

@stevenlix do you have any comments for this PR?

chilo-ms · 2022-03-15T19:17:07Z

I tested it with running BERT model and didn't see performance drop. So, we can make this PR merge to release 1.11.

chilo-ms · 2022-03-15T19:18:00Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

@@ -1537,7 +1538,7 @@ common::Status TensorrtExecutionProvider::Compile(const std::vector<Node*>& fuse

        // Build engine
        {
-          auto lock = GetEngineBuildLock();
+          auto lock = GetApiLock();


I was wondering do we need a lock here? since the whole inference has been protected by a lock already.

onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

Line 1290 in a2c14ca

std::lock_guard<OrtMutex> lock(*(trt_state->tensorrt_mu_ptr));

jywu-msft · 2022-03-15T20:18:45Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

@@ -1537,7 +1538,6 @@ common::Status TensorrtExecutionProvider::Compile(const std::vector<Node*>& fuse

        // Build engine
        {
-          auto lock = GetEngineBuildLock();


Is the lock supposed to be removed here?

Thanks for the reminding and explanation. Yes, we shouldn't remove this lock here.
The inference lock "std::lock_guard lock(*(trt_state->tensorrt_mu_ptr))" in line1290 is per provider instance, and the API lock here is global singleton.
Consider the situation where it has multiple threads each creating session and running inference in the same process, then there can be overlapping engine builds from different threads in different sessions. So we still need a global singleton lock to prevent overlapping engine builds.

This reverts commit 9c2317b.

…e() for TRT EP (#10714) * Add multithread unit test and put lock on library call * update code * remove debug code * add comment * add one session multi-threads inference * Put lock for build engine all the time * Update naming and comment * remove unnecessary lock * Revert "remove unnecessary lock" This reverts commit 9c2317b.

@yufenglee

* Update to flatbuffers v2.0.0 (#10866) * Fix Reduced ops pipeline (#10861) * Fix a couple of issues with the python package tools (#10858) * Tweaks to the model utils * Add handling for a dim_value of -1 when replacing the entire input shape. This occurs in models exported from PaddlePaddle * make pytorch helpers accessible in package * make QDQ helpers accessible in package * Fix wrong percentile values returned during calibration (#10847) * Use numpy.percentile to get the lookup value. * Use 1.0 as float value rather than integer. * Add missing cdf parameter for `np.percentile`. * Use 100. instead of 1.0 * Remove print. * Update from @yufenglee * Add support for opset 16 to transpose optimizer. (#10841) * Add support for opset 16 to transpose optimizer. Only change required is for GridSample to be added to the layout sensitive ops. The existing handling for layout transpose works with that as the first input and first output are layout sensitive. Update the optimize to be able to return an error message if it fails. * Use separate build directories for full and mobile iOS packages. (#10835) * Address performance issue with abseil flat_hash_table. (#10819) When returning by value in a cross DLL call, the hash table even though containing all the entries that are originally there can not find at least some of them. Reverting to std::unordered_set pending further investigation. * Mark end of version 11 C API. (#10803) * Mark end of version 11 C API * Add static_assert * avoid using LocalFree on FormatMessageW buffer (#10796) * remove local free * Remove local free from onnxruntime * don't allocate * Change to use constexpr to satisfy CPU build warning * Integrate C-API tests into Pipelines for release packages (#10794) * add c-api test for package * fix bug for running c-api test for package * refine run application script * remove redundant code * include CUDA test * Remove testing CUDA EP temporarily * fix bug * Code refactor * try to fix YAML bug * try to fix YAML bug * try to fix YAML bug * fix bug for multiple directories in Pipelines * fix bug * add comments and fix bug * Update c-api-noopenmp-packaging-pipelines.yml * Remove failOnStandardError flag in Pipelines * Detect runtime CUDA JIT and warn the user (#10781) * Use cudaMalloc vs cudaDeviceSynchronize and show the total time * Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765) Add runtime optimization support to ONNX -> ORT format conversion script. Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option. * Add multithreading test and put a lock on nvinfer1::createInferRuntime() for TRT EP (#10714) * Add multithread unit test and put lock on library call * update code * remove debug code * add comment * add one session multi-threads inference * Put lock for build engine all the time * Update naming and comment * remove unnecessary lock * Revert "remove unnecessary lock" This reverts commit 9c2317b. * Fix handling of nodes inserted by NHWC transformer. (#10904) (#10925) * Revert "Upsample support NHWC (#10554)" (#10917) This reverts commit bd08f11. Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> * [python API] Change raise import error when `C:\Windows\System32\vcruntime140_1.dll` is not found to warning (#10927) * remove throw if C:\\Windows\\System32\\vcruntime140_1.dll cannot be found * Add comments and update warning message * adding back accidentally removed line Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com> * [js] Create npm packaging pipeline (#10886) * create npm packaging pipeline * fix indentations * Update npm-packaging-pipeline.yml for Azure Pipelines * Update npm-packaging-pipeline.yml for Azure Pipelines * Update npm-packaging-pipeline.yml for Azure Pipelines * react-native-ci as a template * fix typos * fix template paths * add a depencendy * change a stage name * set different artifact name for each package * fix typo * Update npm-packaging-pipeline.yml for Azure Pipelines Set a build Id for node npm package as a parameter * Update npm-packaging-pipeline.yml for Azure Pipelines Set a build Id for node npm package as a parameter * Update npm-packaging-pipeline.yml for Azure Pipelines * Follow up update for python API checking if `vcruntime140_1.dll` is available (#10927) (#10933) Co-authored-by: Hariharan Seshadri <hasesh@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Ryan Lai <rylai@microsoft.com> Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com> Co-authored-by: Yi-Hong Lyu <yilyu@microsoft.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com> Co-authored-by: Sunghoon <35605090+hanbitmyths@users.noreply.github.com>

…e() for TRT EP (microsoft#10714) * Add multithread unit test and put lock on library call * update code * remove debug code * add comment * add one session multi-threads inference * Put lock for build engine all the time * Update naming and comment * remove unnecessary lock * Revert "remove unnecessary lock" This reverts commit 9c2317b.

chilo-ms added 4 commits February 23, 2022 17:49

Add multithread unit test and put lock on library call

eee3501

update code

d697e78

remove debug code

641aac4

add comment

5e47eaa

chilo-ms requested review from stevenlix and jywu-msft March 1, 2022 20:00

chilo-ms commented Mar 1, 2022

View reviewed changes

jywu-msft reviewed Mar 1, 2022

View reviewed changes

chilo-ms added 2 commits March 2, 2022 21:08

add one session multi-threads inference

f78efea

Put lock for build engine all the time

30b86ab

jywu-msft reviewed Mar 4, 2022

View reviewed changes

Update naming and comment

a2c14ca

jywu-msft reviewed Mar 4, 2022

View reviewed changes

jywu-msft previously approved these changes Mar 4, 2022

View reviewed changes

chilo-ms added release:1.11 and removed release:1.11 labels Mar 14, 2022

chilo-ms commented Mar 15, 2022

View reviewed changes

chilo-ms added the release:1.11 label Mar 15, 2022

remove unnecessary lock

9c2317b

chilo-ms dismissed jywu-msft’s stale review via 9c2317b March 15, 2022 19:24

jywu-msft reviewed Mar 15, 2022

View reviewed changes

Revert "remove unnecessary lock"

b7560c3

This reverts commit 9c2317b.

jywu-msft approved these changes Mar 16, 2022

View reviewed changes

chilo-ms merged commit 42d7112 into master Mar 16, 2022

chilo-ms deleted the multi_threads_trt_test branch March 16, 2022 16:19

faxu added the triage:approved label Mar 16, 2022

chilo-ms mentioned this pull request Mar 17, 2022

Release 1.11.0 cherry pick round 1 #10915

Merged

chilo-ms mentioned this pull request Aug 7, 2023

Add PerThreadContext for TRT EP #16599

Merged

Add multithreading test and put a lock on nvinfer1::createInferRuntime() for TRT EP #10714

Add multithreading test and put a lock on nvinfer1::createInferRuntime() for TRT EP #10714

Uh oh!

Conversation

chilo-ms commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jywu-msft Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chilo-ms Mar 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jywu-msft commented Mar 4, 2022

Uh oh!

chilo-ms commented Mar 15, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chilo-ms Mar 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chilo-ms commented Mar 1, 2022 •

edited

Loading

jywu-msft Mar 1, 2022 •

edited

Loading

chilo-ms Mar 10, 2022 •

edited

Loading

chilo-ms Mar 15, 2022 •

edited

Loading