-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AUTO] Filter device when compile_model with file path #27019
base: master
Are you sure you want to change the base?
[AUTO] Filter device when compile_model with file path #27019
Conversation
support_devices = filter_device_by_model(support_devices_by_property, cloned_model, load_config); | ||
} else { | ||
auto_s_context->m_model_path = model_path; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangleis this will degrade model compile latency.
do you see other better solutions? not sure if possible to get stateful info from cache or model file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangleis all of the OV HW plugins, including CPU, GPU and NPU, only have one compile model API that only accepts the model object, instead of model path, as the input parameter. The compile latency may not change when AUTO calls read_model()
before try to compile model to HW plugins. However, Core has been implemented an virtual compile model API that will returns a model object created by calling read_model()
API, which means HW plugin can override this API.
- compile model API in CPU plugin:
std::shared_ptr<ov::ICompiledModel> compile_model(const std::shared_ptr<const ov::Model>& model, - compile model API in GPU plugin:
std::shared_ptr<ov::ICompiledModel> compile_model(const std::shared_ptr<const ov::Model>& model, - compile model API in NPU plugin:
std::shared_ptr<ov::ICompiledModel> compile_model(const std::shared_ptr<const ov::Model>& model,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, if core.compile_model("test.xml", GPU) with cache enabled can work, why auto cannot benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No performance gap was observed on 12 different scale models when compiling the model with cache enabled.
Need to check if disabling compile_model with model_path API in AUTO is acceptable. |
…cache_dir[PR#24726]. 2. enable model type filter logic with cache enabled for AUTO. 3. add test case when cache enabled.
This PR will be closed in a week because of 2 weeks of no activity. |
|
Details:
Tickets: