Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AUTO] Filter device when compile_model with file path #27019

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
bc2f794
enabel AUTO to read model if passing model path into plugin.
WeldonWangwang Oct 11, 2024
3e2166a
enable test case.
WeldonWangwang Oct 12, 2024
c435080
update test case.
WeldonWangwang Oct 12, 2024
63b49bc
Update.
WeldonWangwang Oct 12, 2024
6734760
update.
WeldonWangwang Oct 12, 2024
3a76f96
update.
WeldonWangwang Oct 12, 2024
3660bbd
update.
yangwang201911 Oct 15, 2024
e23f124
Update.
WeldonWangwang Oct 16, 2024
877360d
fix the issue of calculating the first infer time.
yangwang201911 Oct 17, 2024
6208642
fix the issue when cache enabled.
yangwang201911 Oct 18, 2024
4ba6a02
the default setting for runtime fallback is to be disabled.
yangwang201911 Oct 21, 2024
b2f8c72
1. update the test case to disable CPU model cache when user app set …
yangwang201911 Oct 22, 2024
8e9a6b3
update.
yangwang201911 Oct 23, 2024
56246ce
update.
yangwang201911 Oct 23, 2024
8010546
update.
yangwang201911 Oct 23, 2024
73da15b
update.
yangwang201911 Oct 24, 2024
a917a76
Merge branch 'master' into ywang2/fix_query_statue_not_implemented_fo…
yangwang201911 Oct 25, 2024
1d4f0aa
Update the description of runtime fallback.
WeldonWangwang Oct 28, 2024
d8c5144
Merge branch 'master' into ywang2/fix_query_statue_not_implemented_fo…
yangwang201911 Nov 4, 2024
726b3a4
Merge branch 'master' into ywang2/fix_query_statue_not_implemented_fo…
yangwang201911 Nov 11, 2024
54badff
update.
yangwang201911 Nov 12, 2024
42e655e
Merge branch 'master' into ywang2/fix_query_statue_not_implemented_fo…
yangwang201911 Nov 12, 2024
eea0fd3
Merge branch 'master' into ywang2/fix_query_statue_not_implemented_fo…
yangwang201911 Nov 18, 2024
2e97ef4
Merge branch 'master' into ywang2/fix_query_statue_not_implemented_fo…
peterchen-intel Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix the issue when cache enabled.
  • Loading branch information
yangwang201911 authored and WeldonWangwang committed Oct 18, 2024
commit 62086422284f0ddbab8c874587aa64ae98657234
2 changes: 1 addition & 1 deletion src/inference/tests/functional/caching_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2136,7 +2136,7 @@ TEST_P(CachingTest, LoadAUTO_OneDeviceNoImportExport) {
EXPECT_CALL(*mockPlugin, compile_model(_, _, _)).Times(m_remoteContext ? 2 : 0);
EXPECT_CALL(*mockPlugin, compile_model(A<const std::shared_ptr<const ov::Model>&>(), _))
.Times(!m_remoteContext ? 2 : 0);
EXPECT_CALL(*mockPlugin, OnCompileModelFromFile()).Times(0);
EXPECT_CALL(*mockPlugin, OnCompileModelFromFile()).Times(m_type == TestLoadType::EModelName ? 2 : 0);
EXPECT_CALL(*mockPlugin, import_model(_, _, _)).Times(0);
EXPECT_CALL(*mockPlugin, import_model(_, _)).Times(0);
testLoad([&](ov::Core& core) {
Expand Down
67 changes: 53 additions & 14 deletions src/plugins/auto/src/auto_schedule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -133,12 +133,7 @@ void AutoSchedule::init() {
auto customize_helper_context_from_cache_setting = [this](bool is_actual_cpu,
AutoCompileContext m_compile_context[],
ScheduleContext::Ptr& m_context) {
const auto cpu_iter = deviceChecker().check_and_return_if_device_in_list("CPU", m_context->m_device_priorities);
if (cpu_iter == m_context->m_device_priorities.end()) {
m_compile_context[CPU].m_is_enabled = false;
return;
}
m_compile_context[CPU].m_is_enabled = true;
bool is_stateful_model = false;
if (!is_actual_cpu) {
const auto& device = m_compile_context[ACTUALDEVICE].m_device_info.device_name;
auto& device_config = m_compile_context[ACTUALDEVICE].m_device_info.config;
Expand All @@ -157,13 +152,58 @@ void AutoSchedule::init() {
else
blobId = ov::ModelCache::compute_hash(m_context->m_model_path, properties);
std::string cached_model_path = ov::util::make_path(cache_dir, blobId + ".blob");
m_compile_context[CPU].m_is_enabled = !ov::util::file_exists(cached_model_path);
LOG_DEBUG_TAG("device: %s %s cached blob: %s ",
device.c_str(),
m_compile_context[CPU].m_is_enabled ? "not found" : "found",
cached_model_path.c_str());
if (!ov::util::file_exists(cached_model_path)) {
LOG_DEBUG_TAG("device: %s not found cached blob: %s ", device.c_str(), cached_model_path.c_str());
// not found blob file
if (!m_context->m_model) {
// passed model path
std::cout << "=== blob not found and will read model here ===\n";
auto m_model = m_context->m_ov_core->read_model(m_context->m_model_path, std::string{});
for (auto& op : m_model->get_ops()) {
if (std::dynamic_pointer_cast<ov::op::util::AssignBase>(op) ||
std::dynamic_pointer_cast<ov::op::util::ReadValueBase>(op)) {
is_stateful_model = true;
break;
}
}
if (is_stateful_model) {
std::cout
<< "=== stateful model. will disable CPU as accelerator and runtime fallback ===\n";
m_compile_context[CPU].m_is_enabled = false;
m_context->m_runtime_fallback = false;
m_context->m_startup_fallback = false;
}
}
} else {
// found blob file
std::cout << "=== found blob and will passing model path to acutal device ===\n";
LOG_DEBUG_TAG("device: %s found cached blob: %s ", device.c_str(), cached_model_path.c_str());
m_compile_context[CPU].m_is_enabled = false;
m_context->m_startup_fallback = false;
if (m_context->m_model) {
m_context->m_runtime_fallback = false;
} else {
auto m_model = m_context->m_ov_core->read_model(m_context->m_model_path, std::string{});
for (auto& op : m_model->get_ops()) {
if (std::dynamic_pointer_cast<ov::op::util::AssignBase>(op) ||
std::dynamic_pointer_cast<ov::op::util::ReadValueBase>(op)) {
is_stateful_model = true;
break;
}
}
if (is_stateful_model) {
std::cout << "=== stateful model. will disable runtime fallback ===\n";
m_context->m_runtime_fallback = false;
}
}
}
}
}
const auto cpu_iter = deviceChecker().check_and_return_if_device_in_list("CPU", m_context->m_device_priorities);
if (cpu_iter == m_context->m_device_priorities.end()) {
m_compile_context[CPU].m_is_enabled = false;
return;
}
if (m_compile_context[CPU].m_is_enabled) {
m_compile_context[CPU].m_device_info = *cpu_iter;
m_compile_context[CPU].m_device_info.config[ov::hint::performance_mode.name()] =
Expand All @@ -184,9 +224,8 @@ void AutoSchedule::init() {
// m_compile_context[ACTUALDEVICE]
if (is_actual_cpu || !m_context->m_startup_fallback) {
m_compile_context[CPU].m_is_enabled = false;
} else {
customize_helper_context_from_cache_setting(is_actual_cpu, m_compile_context, m_context);
}
customize_helper_context_from_cache_setting(is_actual_cpu, m_compile_context, m_context);
// initialize the rest members of load context
for (int i = 0; i < CONTEXTNUM; i++) {
if (m_compile_context[i].m_is_enabled) {
Expand Down Expand Up @@ -336,7 +375,7 @@ void AutoSchedule::try_to_compile_model(AutoCompileContext& context, const std::
if ((m_context->m_model)) {
yangwang201911 marked this conversation as resolved.
Show resolved Hide resolved
context.m_compiled_model = m_context->m_ov_core->compile_model(model, device, device_config);
} else {
OPENVINO_THROW("OpenVino Model is empty!");
context.m_compiled_model = m_context->m_ov_core->compile_model(m_context->m_model_path, device, device_config);
}
context.m_is_load_success = true;
auto compile_end_time = std::chrono::high_resolution_clock::now();
Expand Down
22 changes: 20 additions & 2 deletions src/plugins/auto/src/plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -405,10 +405,28 @@ std::shared_ptr<ov::ICompiledModel> Plugin::compile_model_impl(const std::string
cloned_model = model->clone();
} else {
LOG_INFO_TAG("compile model with model path");
if (work_mode_auto) {
cloned_model = get_core()->read_model(model_path, std::string{});
auto iter_plugin_cache_dir = properties.find(ov::cache_dir.name());
std::string cache_dir =
iter_plugin_cache_dir != properties.end() ? iter_plugin_cache_dir->second.as<std::string>() : "";
if (cache_dir.empty()) {
try {
cache_dir = get_core()->get_property("", ov::cache_dir);
} catch (std::exception&) {
LOG_DEBUG_TAG("Failed to get property %s from core", ov::cache_dir.name());
}
}
if (work_mode_auto && cache_dir.empty()) {
// cache disable and will read model first here
LOG_DEBUG_TAG("Try to read model via core from model path: %s", model_path.c_str());
try {
cloned_model = get_core()->read_model(model_path, std::string{});
} catch (const ov::Exception&) {
OPENVINO_THROW("Failed to read model from model path:%s", model_path.c_str());
}
support_devices = filter_device_by_model(support_devices_by_property, cloned_model, load_config);
} else {
// cache enabled and will pass model path into schedule
LOG_DEBUG_TAG("Will pass model path into auto schedule: %s", model_path.c_str());
auto_s_context->m_model_path = model_path;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangleis this will degrade model compile latency.
do you see other better solutions? not sure if possible to get stateful info from cache or model file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangleis all of the OV HW plugins, including CPU, GPU and NPU, only have one compile model API that only accepts the model object, instead of model path, as the input parameter. The compile latency may not change when AUTO calls read_model() before try to compile model to HW plugins. However, Core has been implemented an virtual compile model API that will returns a model object created by calling read_model() API, which means HW plugin can override this API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After detailed performance checks during model compile phase, the mem utilization and latency has no obvious changes when AUTO passes loaded model(read_model()), instead of model path, to HW plugin via Core. @wangleis @songbell

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, if core.compile_model("test.xml", GPU) with cache enabled can work, why auto cannot benefit?

Copy link
Contributor Author

@yangwang201911 yangwang201911 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only test compile model with cache disabled and no degrade happened for AUTO in this situation. Will check the performance change for compile model path with cache enabled soon. @songbell @wangleis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No performance gap was observed on 12 different scale models when compiling the model with cache enabled.

}
Expand Down
Loading