-
Notifications
You must be signed in to change notification settings - Fork 44
feat: Enable EpContext OVIR Encapsulation #704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the EPContext OVIR Encapsulation feature by updating the model import, compilation, and inference logic across the OpenVINO provider. Key changes include:
- Adding a new parameter (enable_causallm) to the OVCore::ImportModel API.
- Updating the OVCore::ImportModel implementation to branch on XML model stream detection and enable stateful compilation.
- Introducing a helper function to detect XML model streams and enforcing OpenVINO SDK version compatibility in the onnx_ctx_model_helper.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
onnxruntime/core/providers/openvino/ov_interface.h | Added the bool enable_causallm parameter to ImportModel. |
onnxruntime/core/providers/openvino/ov_interface.cc | Refactored ImportModel logic and repositioned log messages based on the model format. |
onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc | Added XML stream check and enforced SDK version compatibility with an error message. |
onnxruntime/core/providers/openvino/backends/basic_backend.cc | Updated the call to ImportModel to supply the new parameter and model path name. |
onnxruntime/core/providers/openvino/backend_utils.h | Declared the new IsModelStreamXML helper. |
onnxruntime/core/providers/openvino/backend_utils.cc | Implemented IsModelStreamXML to detect XML headers in the model stream. |
Comments suppressed due to low confidence (1)
onnxruntime/core/providers/openvino/ov_interface.h:82
- [nitpick] Consider renaming 'enable_causallm' to 'enableCausalLM' to follow typical C++ camelCase naming conventions and improve readability.
bool enable_causallm,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I tested this on my LNL machine with some EPCtx models that are used for AI Toolkit. They seem to work fine.
The only issue I see, which is somewhat outside of the scope of this specific PR is, as compared to msb_release_v2 branch, python applications are unable to recover from KV-Cache is full.
exceptions.
With this branch (and ovep-develop
), when these exceptions are thrown, the infer request is deleted from the infer request queue (with delete Request0
message printed), and when application tries to start another generation sequence, the application crashes.
With msb_release_v2
, the application is able to catch these exceptions and then proceed / try again with the next generation sequence (after rewinding KV-Cache state, etc.).
// where weights from bin file is directly consumed | ||
std::string xml_file_name = name; | ||
if (name.size() >= 5 && name.substr(name.size() - 5) == ".onnx") { | ||
xml_file_name.replace(name.size() - 5, 5, ".xml"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we validated this with CreateSessionFromArray where model is passed in memory? Is there a way to decouple from the location on disk since the onnx model and its contents should be portable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a file name handling code where your input model name string i.e. mode.onnx is now reprsented as an input xml file name string i.e. model.xml, this wont impact the memory usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not regarding memory usage but whether model passed in memory would work.
This can be verified by:
onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|C:\resnet50_int8_st.onnx" C:\resnet50_int8_st.onnx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not regarding memory usage but whether model passed in memory would work.
This can be verified by: onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|C:\resnet50_int8_st.onnx" C:\resnet50_int8_st.onnx
tested on latest commit 143f4c1 & it is functional
@@ -73,7 +73,8 @@ BasicBackend::BasicBackend(std::unique_ptr<ONNX_NAMESPACE::ModelProto>& model_pr | |||
exe_network_ = OVCore::Get()->ImportModel(*model_stream, | |||
hw_target, | |||
device_config, | |||
subgraph_context_.subgraph_name); | |||
enable_causallm, | |||
session_context_.onnx_model_path_name.string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javier-intel, @preetha-intel .. Does this change start supporting OV IR wrapped in ONNX but impact pre-compiled and partitioned ONNX models as we do not have any reference to subgraph_context anymore passed into ImportModel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field subgraph_context_.subgraph_name was a redundant entity in the current implementation which was used in exception handling with the graph name, thus we are using the original model name here to facilitate the model xml contents to be parsed while loading the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With subgraph_context_.subgraph_name we get better error handling inside ImportModel, so let's try to retain the argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
c1aa179
to
1748e06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see failures after testing with latest forced-push. It seems like when I use EPCtx-wrapped models, the control flow is attempting to use ImportModel
path:
>python chat_sequence_test.py --ortgenai_model_path C:\Users\LocalAdmin\ort_build\EPCtxModels\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\model
Creating Model...
2025-06-16 07:39:25.1757742 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2288 onnxruntime::InferenceSession::Initialize::<lambda_e5577c429b426a2a020919e219e36787>::operator ()] Exception during initialization: C:\Users\LocalAdmin\ort_build\BuildArtifacts-20250616_065035\onnxruntime\onnxruntime\core\providers\openvino\ov_interface.cc:232 class onnxruntime::openvino_ep::OVExeNetwork __cdecl onnxruntime::openvino_ep::OVCore::ImportModel(class std::basic_istream<char,struct std::char_traits<char> > &,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,const class std::map<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class ov::Any,struct std::less<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >,class std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class ov::Any> > > &,bool,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >) [OpenVINO-EP] Exception while Loading Network for graph: C:\Users\LocalAdmin\ort_build\EPCtxModels\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\model\openvino_model.onnxException from src\inference\src\cpp\core.cpp:112:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\npuw\compiled_model.cpp:460:
Failed to compile Model0_FCEW000__0 for all devices in [NPU]
Traceback (most recent call last):
File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\chat_sequence\chat_sequence_test.py", line 277, in <module>
main(args)
File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\chat_sequence\chat_sequence_test.py", line 115, in main
chat = OrtGenaiChat(args, search_options)
File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\common\ortgenai_chat.py", line 22, in __init__
self.model = og.Model(config)
RuntimeError: Exception during initialization: C:\Users\LocalAdmin\ort_build\BuildArtifacts-20250616_065035\onnxruntime\onnxruntime\core\providers\openvino\ov_interface.cc:232 class onnxruntime::openvino_ep::OVExeNetwork __cdecl onnxruntime::openvino_ep::OVCore::ImportModel(class std::basic_istream<char,struct std::char_traits<char> > &,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,const class std::map<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class ov::Any,struct std::less<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >,class std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class ov::Any> > > &,bool,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >) [OpenVINO-EP] Exception while Loading Network for graph: C:\Users\LocalAdmin\ort_build\EPCtxModels\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\model\openvino_model.onnxException from src\inference\src\cpp\core.cpp:112:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\npuw\compiled_model.cpp:460:
Failed to compile Model0_FCEW000__0 for all devices in [NPU]
1748e06
to
143f4c1
Compare
With this latest branch, I need to add the following to my
This is unacceptable, as requiring this new option to be passed as an absolute path will become very problematic from a user experience / distribution perspective. Can we still have it support the 'old way' (the method we used in |
fixed |
fdd5b0d
to
0908804
Compare
0908804
to
56a0ce5
Compare
Using this latest branch, I seem to get some failures for NPU when using ORT GenAI. It seems to fail within the call to
Specifically, it fails for the
Here is the exception:
|
The precompiled blob export is functional for CPU/GPU plugins while NPU/NPUW plugins run into serialization issues as above which I believe OV toolkit team can fix it. |
Yes, I agree that it's a feature that can be requested to be fixed by NPU/NPUW team. But I am running with OpenVINO 2025.2, which was just released today. So I think OV EP needs to be conscious of this limitation and skip the export for problematic cases (presumably when device==NPU and enable_causallm=True. By the way, I'm a little bit confused why this function |
56a0ce5
to
9a77fcd
Compare
Missed an edge case, fixed it. the export wont get triggered this time in the latest fix |
Okay, with this version I am able to run ORT GenAI with NPU with models that we were testing |
@ankitm3k |
e74d3d6
to
01a26b7
Compare
its fixed now, we are merge ready |
I don't think this is quite ready for merge yet. Running with latest branch here, I hit some exceptions when using ORT GenAI:
|
@ankitm3k @sfatimar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this one, as it seems like ORT GenAI failures were caused by ovep-develop changes that were pulled into this branch during a rebase. @ankitm3k will raise a ticket to track that.
This reverts commit 6d04a2e.
This reverts commit 6d04a2e.
* feat: Enable EpContext OVIR Encapsulation * fix: refactor EpCtx OVIR parsing logic to use ep.context_file_path * fix: Fix logic for parsing model_file_path * fix: enable EPCtx OVIR encapsulation compiled blob caching * fix: fix merge conflicts * fix: fix bugs
* feat: Enable EpContext OVIR Encapsulation * fix: refactor EpCtx OVIR parsing logic to use ep.context_file_path * fix: Fix logic for parsing model_file_path * fix: enable EPCtx OVIR encapsulation compiled blob caching * fix: fix merge conflicts * fix: fix bugs
* feat: Enable EpContext OVIR Encapsulation * fix: refactor EpCtx OVIR parsing logic to use ep.context_file_path * fix: Fix logic for parsing model_file_path * fix: enable EPCtx OVIR encapsulation compiled blob caching * fix: fix merge conflicts * fix: fix bugs
* feat: Enable EpContext OVIR Encapsulation * fix: refactor EpCtx OVIR parsing logic to use ep.context_file_path * fix: Fix logic for parsing model_file_path * fix: enable EPCtx OVIR encapsulation compiled blob caching * fix: fix merge conflicts * fix: fix bugs
Description
This PR enables the EPContext OVIR Encapsulation model import, compilation & inference feature.
To use this feature, one must enable below session option i.e.
ep.context_file_path
(when explicitly using CreateSessionFormArray() API with absolute path) -onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|model.onnx " model.onnx
https://jira.devtools.intel.com/browse/CVS-169087