Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: How to assign model inference to specific CPUs? #27083

Open
3 tasks done
LinGeLin opened this issue Oct 16, 2024 · 4 comments
Open
3 tasks done

[Performance]: How to assign model inference to specific CPUs? #27083

LinGeLin opened this issue Oct 16, 2024 · 4 comments
Assignees
Labels
performance Performance related topics support_request

Comments

@LinGeLin
Copy link

LinGeLin commented Oct 16, 2024

OpenVINO Version

2024.4.0

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

CPU

OpenVINO installation

Build from source

Programming Language

C++

Hardware Architecture

x86 (64 bits)

Model used

ps model

Model quantization

No

Target Platform

No response

Performance issue description

I am developing a gRPC project using C++ and integrating OpenVINO (ov) into it. The project involves multiple thread pools for preprocessing. I have observed that the inference performance is significantly lower than the data measured by benchmark_app. I suspect that this is due to thread competition between ov and the preprocessing threads in the project. I conducted the following tests:

  • When infer_thread=24, the utilization of all 24 CPUs fluctuates around 50%.
  • When infer_thread=16, the utilization of the first 16 CPUs is around 80%, while the utilization of the last 8 CPUs is 0%.

Since my project runs with two models loaded simultaneously, I want to dedicate CPUs 0-11 to Model A, CPUs 12-19 to Model B, and CPUs 20-23 for other operations in the project. However, I haven't found an interface in ov to bind CPUs when loading models. Are there any other suggestions? Thank you.

Step-by-step reproduction

No response

Issue submission checklist

  • I'm reporting a performance issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@wangleis
Copy link
Contributor

hi @LinGeLin Do you run two models in one application process?

@LinGeLin
Copy link
Author

hi @LinGeLin Do you run two models in one application process?

yes

@LinGeLin
Copy link
Author

@wangleis Any suggestions? Or is it just that inferencing multiple models will inherently interfere with each other?

@wangleis
Copy link
Contributor

@LinGeLin Reserving specific CPU resource for specific model in CPU inference is planned but not enabled. Ticket CVS-154222 is created to follow this issue. Will update to you when the feature is enabled in master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance related topics support_request
Projects
None yet
Development

No branches or pull requests

2 participants