-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance]: High overhead latency for ov::InferRequest::infer() #23476
Comments
When I compile and build using my own source-built OpenVINO with
Using my own source-built OpenVINO with
I wonder if OpenVINO makes liberal use of threads synchronization primitives like locks and queues even for single-threaded synchronous use cases. In contrast, a similar setup with onnxruntime using the same model takes 2.8k cycles per inference. It's still high but it's much better. |
Hi @jchia, when app use 1 stream, 1 thread and sync inference API, both For the difference between |
@wangleis Do you also think that 24k cycles for THREADING=SEQ case is excessive so that there is room for improvement? I don't have convenient access to a machine with an Intel processor and OpenVino installed, but I expect the numbers there to be similar. |
@wenjiew I ran the experiment on a i9-13900H on Debian 12, and the results were similar, 24k cycles for SEQ and 27k cycles for TBB. I don't think the high overhead is sensitive to the target processor for a fairly recent processor.
|
@jchia May I know if you request performance optimization for |
I'm suggesting reducing the fixed latency of the THREADING=SEQ case if it's not too complicated. This will be helpful for cases where the computation graph has relatively few operations so the fixed overhead is a large fraction of the total time. |
@jchia Thanks for your suggestion. We had created internal ticket CVS-141653 to follow this input. Since the feature may not be completed soon, we will close this ticket if no other topics. Thanks |
OpenVINO Version
2024.0.0
Operating System
Ubuntu 20.04 (LTS)
Device used for inference
CPU
OpenVINO installation
Build from source
Programming Language
C++
Hardware Architecture
x86 (64 bits)
Model used
ReduceSum
Model quantization
No
Target Platform
Performance issue description
Clarification: I did not install from source. I installed with
apt
from https://apt.repos.intel.com/openvino/2024 but this is not a dropdown option in "OpenVINO installation".I tested inference speed for a 2-element
ReduceSum
in C++. The inference time was 28k cycles, which is roughly 8µs on my machine. TheReduceSum
computation itself should take only a single-digit number of cycles. This level of latency overhead is not suitable for low-latency applications. Is this intended? Am I missing something? If not, please improve the latency overhead.Step-by-step reproduction
make-reducesum.py
with necessary pip packages installed, including onnx and openvino. (The output files get written to/tmp
.)g++ -O3 -o openvino-slow openvino-slow.cpp -lopenvino
taskset -c 3 ./openvino-slow
. Thetaskset -c 3
is to pin the thread to Core 3 and avoid useless kernel-initiated core-switching that can hurt performance. This assumes that on the system, Core 3 is relatively idle.make-reducesum.py
openvino-slow.cpp
Issue submission checklist
The text was updated successfully, but these errors were encountered: