-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Add multithreading test and put a lock on nvinfer1::createInferRuntime() for TRT EP #10714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
static OrtMutex singleton; | ||
|
||
// Acquire a lock only when force_sequential_engine_build_ is true; | ||
return force_sequential_engine_build_ ? std::unique_lock<OrtMutex>(singleton) : std::unique_lock<OrtMutex>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether to change to use unique_lock all the time? Since build engine API is not thread safe.
(Even though current multithreading test doesn't fail here, it's much safer to put lock all the time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i think we should always lock when creating engine. at the very least, we don't want multiple threads building at the same time as it can perturb the tactics timings which can impact performance.
btw, i noticed the buildEngineWithConfig() api is deprecated starting TRT 8.0
it's replaced by buildSerializedNetwork()
at some point, we need to make a pass through trt api calls and update any deprecated api calls. +@stevenlix fyi
CreateBaseModel(model_name, graph_name, dims); | ||
|
||
for (int i = 0; i < num_thread; ++i) | ||
threads.push_back(std::thread(RunInference, model_name, sess_log_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each thread is creating AND running the inference.
we also need to test the case where you create a single session and have multiple threads run inference for same session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single session and multi-threads inference case is added.
@@ -87,6 +88,97 @@ void CreateBaseModel(std::string model_name, std::string graph_name, std::vector | |||
status = onnxruntime::Model::Save(model, model_name); | |||
} | |||
|
|||
void RunInference(std::string model_name, std::string sess_log_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we need to clarify with the naming that each thread is creating its own session and only 1 thread is running inference per session.
we need a multiple run, single session test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename the function.
Otherwise, the constructed unique_lock is not associated with any mutex therefore no locking/unlocking will happen. | ||
Get a unique_lock object to control the concurrency behavior. | ||
Every api call not in the thread-safe operations(https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#threading) | ||
shoud be protected by a lock for multi-threading situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: shoud
rewrite multi-threading situation to 'when invoked by multiple threads concurrently'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
*/ | ||
std::unique_lock<OrtMutex> GetEngineBuildLock() const; | ||
std::unique_lock<OrtMutex> GetLock() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetLock is a bit too generic a name? maybe GetApiLock() or something like that, for enforcing serial ordering of tensorrt operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
@@ -87,6 +88,190 @@ void CreateBaseModel(std::string model_name, std::string graph_name, std::vector | |||
status = onnxruntime::Model::Save(model, model_name); | |||
} | |||
|
|||
void RunSession(InferenceSession& session_object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might consider in the future to migrate these to inference_session_test ?
these tests are basically generic and could be applied to all providers.
probably don't do it in this PR before testing whether all EP's can pass these tests.
@@ -250,6 +250,11 @@ TensorrtLogger& GetTensorrtLogger() { | |||
return trt_logger; | |||
} | |||
|
|||
std::unique_lock<OrtMutex> TensorrtExecutionProvider::GetApiLock() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to be careful if we expand usage of this beyond the current few locking calls since it's singleton and shared across all sessions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep the lock on all the time, we should remove the TRT option force_sequential_engine_build, which provided an option to turn the lock off before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be careful when removing options from structs and api's though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
legacy OrtTensorRTProviderOptions struct: I don't think we can remove this field from the struct and build ORT directly. User application built with the old struct may fail to access force_sequential_engine_build field. We can disable this field instead?
new OrtTensorRTProviderOptionsV2: I think we can remove this field since it's a opaque struct. Probably need to provide warning information when user set force_sequential_engine_build through CreateTensorRTProviderOptions().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the option has become a no-op , so it's not as critical I think
we should remove it from our documentation though
@stevenlix do you have any comments for this PR? |
I tested it with running BERT model and didn't see performance drop. So, we can make this PR merge to release 1.11. |
@@ -1537,7 +1538,7 @@ common::Status TensorrtExecutionProvider::Compile(const std::vector<Node*>& fuse | |||
|
|||
// Build engine | |||
{ | |||
auto lock = GetEngineBuildLock(); | |||
auto lock = GetApiLock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering do we need a lock here? since the whole inference has been protected by a lock already.
std::lock_guard<OrtMutex> lock(*(trt_state->tensorrt_mu_ptr)); |
@@ -1537,7 +1538,6 @@ common::Status TensorrtExecutionProvider::Compile(const std::vector<Node*>& fuse | |||
|
|||
// Build engine | |||
{ | |||
auto lock = GetEngineBuildLock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the lock supposed to be removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminding and explanation. Yes, we shouldn't remove this lock here.
The inference lock "std::lock_guard lock(*(trt_state->tensorrt_mu_ptr))" in line1290 is per provider instance, and the API lock here is global singleton.
Consider the situation where it has multiple threads each creating session and running inference in the same process, then there can be overlapping engine builds from different threads in different sessions. So we still need a global singleton lock to prevent overlapping engine builds.
This reverts commit 9c2317b.
…e() for TRT EP (#10714) * Add multithread unit test and put lock on library call * update code * remove debug code * add comment * add one session multi-threads inference * Put lock for build engine all the time * Update naming and comment * remove unnecessary lock * Revert "remove unnecessary lock" This reverts commit 9c2317b.
* Update to flatbuffers v2.0.0 (#10866) * Fix Reduced ops pipeline (#10861) * Fix a couple of issues with the python package tools (#10858) * Tweaks to the model utils * Add handling for a dim_value of -1 when replacing the entire input shape. This occurs in models exported from PaddlePaddle * make pytorch helpers accessible in package * make QDQ helpers accessible in package * Fix wrong percentile values returned during calibration (#10847) * Use numpy.percentile to get the lookup value. * Use 1.0 as float value rather than integer. * Add missing cdf parameter for `np.percentile`. * Use 100. instead of 1.0 * Remove print. * Update from @yufenglee * Add support for opset 16 to transpose optimizer. (#10841) * Add support for opset 16 to transpose optimizer. Only change required is for GridSample to be added to the layout sensitive ops. The existing handling for layout transpose works with that as the first input and first output are layout sensitive. Update the optimize to be able to return an error message if it fails. * Use separate build directories for full and mobile iOS packages. (#10835) * Address performance issue with abseil flat_hash_table. (#10819) When returning by value in a cross DLL call, the hash table even though containing all the entries that are originally there can not find at least some of them. Reverting to std::unordered_set pending further investigation. * Mark end of version 11 C API. (#10803) * Mark end of version 11 C API * Add static_assert * avoid using LocalFree on FormatMessageW buffer (#10796) * remove local free * Remove local free from onnxruntime * don't allocate * Change to use constexpr to satisfy CPU build warning * Integrate C-API tests into Pipelines for release packages (#10794) * add c-api test for package * fix bug for running c-api test for package * refine run application script * remove redundant code * include CUDA test * Remove testing CUDA EP temporarily * fix bug * Code refactor * try to fix YAML bug * try to fix YAML bug * try to fix YAML bug * fix bug for multiple directories in Pipelines * fix bug * add comments and fix bug * Update c-api-noopenmp-packaging-pipelines.yml * Remove failOnStandardError flag in Pipelines * Detect runtime CUDA JIT and warn the user (#10781) * Use cudaMalloc vs cudaDeviceSynchronize and show the total time * Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765) Add runtime optimization support to ONNX -> ORT format conversion script. Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option. * Add multithreading test and put a lock on nvinfer1::createInferRuntime() for TRT EP (#10714) * Add multithread unit test and put lock on library call * update code * remove debug code * add comment * add one session multi-threads inference * Put lock for build engine all the time * Update naming and comment * remove unnecessary lock * Revert "remove unnecessary lock" This reverts commit 9c2317b. * Fix handling of nodes inserted by NHWC transformer. (#10904) (#10925) * Revert "Upsample support NHWC (#10554)" (#10917) This reverts commit bd08f11. Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> * [python API] Change raise import error when `C:\Windows\System32\vcruntime140_1.dll` is not found to warning (#10927) * remove throw if C:\\Windows\\System32\\vcruntime140_1.dll cannot be found * Add comments and update warning message * adding back accidentally removed line Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com> * [js] Create npm packaging pipeline (#10886) * create npm packaging pipeline * fix indentations * Update npm-packaging-pipeline.yml for Azure Pipelines * Update npm-packaging-pipeline.yml for Azure Pipelines * Update npm-packaging-pipeline.yml for Azure Pipelines * react-native-ci as a template * fix typos * fix template paths * add a depencendy * change a stage name * set different artifact name for each package * fix typo * Update npm-packaging-pipeline.yml for Azure Pipelines Set a build Id for node npm package as a parameter * Update npm-packaging-pipeline.yml for Azure Pipelines Set a build Id for node npm package as a parameter * Update npm-packaging-pipeline.yml for Azure Pipelines * Follow up update for python API checking if `vcruntime140_1.dll` is available (#10927) (#10933) Co-authored-by: Hariharan Seshadri <hasesh@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Ryan Lai <rylai@microsoft.com> Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com> Co-authored-by: Yi-Hong Lyu <yilyu@microsoft.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com> Co-authored-by: Sunghoon <35605090+hanbitmyths@users.noreply.github.com>
…e() for TRT EP (microsoft#10714) * Add multithread unit test and put lock on library call * update code * remove debug code * add comment * add one session multi-threads inference * Put lock for build engine all the time * Update naming and comment * remove unnecessary lock * Revert "remove unnecessary lock" This reverts commit 9c2317b.
Added multithreading test as well as put a lock around nvinfer1::createInferRuntime() call.
Without the lock, the test has the chances to encounter segfault or fail due to race conditions.