Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AUTO] Filter device when compile_model with file path #27019
base: master
Are you sure you want to change the base?
[AUTO] Filter device when compile_model with file path #27019
Changes from 1 commit
bc2f794
3e2166a
c435080
63b49bc
6734760
3a76f96
3660bbd
e23f124
877360d
6208642
4ba6a02
b2f8c72
8e9a6b3
56246ce
8010546
73da15b
a917a76
1d4f0aa
d8c5144
726b3a4
54badff
42e655e
eea0fd3
2e97ef4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangleis this will degrade model compile latency.
do you see other better solutions? not sure if possible to get stateful info from cache or model file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangleis all of the OV HW plugins, including CPU, GPU and NPU, only have one compile model API that only accepts the model object, instead of model path, as the input parameter. The compile latency may not change when AUTO calls
read_model()
before try to compile model to HW plugins. However, Core has been implemented an virtual compile model API that will returns a model object created by callingread_model()
API, which means HW plugin can override this API.openvino/src/plugins/intel_cpu/src/plugin.h
Line 18 in 421eaec
openvino/src/plugins/intel_gpu/include/intel_gpu/plugin/plugin.hpp
Line 51 in 421eaec
openvino/src/plugins/intel_npu/src/plugin/include/plugin.hpp
Line 36 in 421eaec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After detailed performance checks during model compile phase, the mem utilization and latency has no obvious changes when AUTO passes loaded model(
read_model()
), instead of model path, to HW plugin via Core. @wangleis @songbellThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, if core.compile_model("test.xml", GPU) with cache enabled can work, why auto cannot benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, only test compile model with cache disabled and no degrade happened for AUTO in this situation. Will check the performance change for compile model path with cache enabled soon. @songbell @wangleis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No performance gap was observed on 12 different scale models when compiling the model with cache enabled.