feat: Enable EpContext OVIR Encapsulation #704

ankitm3k · 2025-06-11T12:54:48Z

Description

This PR enables the EPContext OVIR Encapsulation model import, compilation & inference feature.

To use this feature, one must enable below session option i.e. ep.context_file_path (when explicitly using CreateSessionFormArray() API with absolute path) -
onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|model.onnx " model.onnx

https://jira.devtools.intel.com/browse/CVS-169087

Copilot

Pull Request Overview

This PR adds support for the EPContext OVIR Encapsulation feature by updating the model import, compilation, and inference logic across the OpenVINO provider. Key changes include:

Adding a new parameter (enable_causallm) to the OVCore::ImportModel API.
Updating the OVCore::ImportModel implementation to branch on XML model stream detection and enable stateful compilation.
Introducing a helper function to detect XML model streams and enforcing OpenVINO SDK version compatibility in the onnx_ctx_model_helper.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
onnxruntime/core/providers/openvino/ov_interface.h	Added the bool enable_causallm parameter to ImportModel.
onnxruntime/core/providers/openvino/ov_interface.cc	Refactored ImportModel logic and repositioned log messages based on the model format.
onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc	Added XML stream check and enforced SDK version compatibility with an error message.
onnxruntime/core/providers/openvino/backends/basic_backend.cc	Updated the call to ImportModel to supply the new parameter and model path name.
onnxruntime/core/providers/openvino/backend_utils.h	Declared the new IsModelStreamXML helper.
onnxruntime/core/providers/openvino/backend_utils.cc	Implemented IsModelStreamXML to detect XML headers in the model stream.

Comments suppressed due to low confidence (1)

onnxruntime/core/providers/openvino/ov_interface.h:82

[nitpick] Consider renaming 'enable_causallm' to 'enableCausalLM' to follow typical C++ camelCase naming conventions and improve readability.

bool enable_causallm,

onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc

RyanMetcalfeInt8

LGTM. I tested this on my LNL machine with some EPCtx models that are used for AI Toolkit. They seem to work fine.

The only issue I see, which is somewhat outside of the scope of this specific PR is, as compared to msb_release_v2 branch, python applications are unable to recover from KV-Cache is full. exceptions.

With this branch (and ovep-develop), when these exceptions are thrown, the infer request is deleted from the infer request queue (with delete Request0 message printed), and when application tries to start another generation sequence, the application crashes.

With msb_release_v2, the application is able to catch these exceptions and then proceed / try again with the next generation sequence (after rewinding KV-Cache state, etc.).

onnxruntime/core/providers/openvino/ov_interface.cc

MayureshV1 · 2025-06-12T19:25:53Z

onnxruntime/core/providers/openvino/ov_interface.cc

+      // where weights from bin file is directly consumed
+      std::string xml_file_name = name;
+      if (name.size() >= 5 && name.substr(name.size() - 5) == ".onnx") {
+        xml_file_name.replace(name.size() - 5, 5, ".xml");


Have we validated this with CreateSessionFromArray where model is passed in memory? Is there a way to decouple from the location on disk since the onnx model and its contents should be portable.

this is just a file name handling code where your input model name string i.e. mode.onnx is now reprsented as an input xml file name string i.e. model.xml, this wont impact the memory usage

This is not regarding memory usage but whether model passed in memory would work.

This can be verified by:
onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|C:\resnet50_int8_st.onnx" C:\resnet50_int8_st.onnx

This is not regarding memory usage but whether model passed in memory would work.

This can be verified by: onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|C:\resnet50_int8_st.onnx" C:\resnet50_int8_st.onnx

tested on latest commit 143f4c1 & it is functional

MayureshV1 · 2025-06-12T19:28:59Z

onnxruntime/core/providers/openvino/backends/basic_backend.cc

@@ -73,7 +73,8 @@ BasicBackend::BasicBackend(std::unique_ptr<ONNX_NAMESPACE::ModelProto>& model_pr
      exe_network_ = OVCore::Get()->ImportModel(*model_stream,
                                                hw_target,
                                                device_config,
-                                                subgraph_context_.subgraph_name);
+                                                enable_causallm,
+                                                session_context_.onnx_model_path_name.string());


@javier-intel, @preetha-intel .. Does this change start supporting OV IR wrapped in ONNX but impact pre-compiled and partitioned ONNX models as we do not have any reference to subgraph_context anymore passed into ImportModel ?

The field subgraph_context_.subgraph_name was a redundant entity in the current implementation which was used in exception handling with the graph name, thus we are using the original model name here to facilitate the model xml contents to be parsed while loading the model

With subgraph_context_.subgraph_name we get better error handling inside ImportModel, so let's try to retain the argument.

onnxruntime/core/providers/openvino/backend_utils.cc

RyanMetcalfeInt8

I see failures after testing with latest forced-push. It seems like when I use EPCtx-wrapped models, the control flow is attempting to use ImportModel path:

>python chat_sequence_test.py --ortgenai_model_path C:\Users\LocalAdmin\ort_build\EPCtxModels\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\model
Creating Model...
2025-06-16 07:39:25.1757742 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2288 onnxruntime::InferenceSession::Initialize::<lambda_e5577c429b426a2a020919e219e36787>::operator ()] Exception during initialization: C:\Users\LocalAdmin\ort_build\BuildArtifacts-20250616_065035\onnxruntime\onnxruntime\core\providers\openvino\ov_interface.cc:232 class onnxruntime::openvino_ep::OVExeNetwork __cdecl onnxruntime::openvino_ep::OVCore::ImportModel(class std::basic_istream<char,struct std::char_traits<char> > &,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,const class std::map<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class ov::Any,struct std::less<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >,class std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class ov::Any> > > &,bool,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >) [OpenVINO-EP]  Exception while Loading Network for graph: C:\Users\LocalAdmin\ort_build\EPCtxModels\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\model\openvino_model.onnxException from src\inference\src\cpp\core.cpp:112:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\npuw\compiled_model.cpp:460:
Failed to compile Model0_FCEW000__0 for all devices in [NPU]




Traceback (most recent call last):
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\chat_sequence\chat_sequence_test.py", line 277, in <module>
    main(args)
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\chat_sequence\chat_sequence_test.py", line 115, in main
    chat = OrtGenaiChat(args, search_options)
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\common\ortgenai_chat.py", line 22, in __init__
    self.model = og.Model(config)
RuntimeError: Exception during initialization: C:\Users\LocalAdmin\ort_build\BuildArtifacts-20250616_065035\onnxruntime\onnxruntime\core\providers\openvino\ov_interface.cc:232 class onnxruntime::openvino_ep::OVExeNetwork __cdecl onnxruntime::openvino_ep::OVCore::ImportModel(class std::basic_istream<char,struct std::char_traits<char> > &,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,const class std::map<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class ov::Any,struct std::less<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >,class std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class ov::Any> > > &,bool,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >) [OpenVINO-EP]  Exception while Loading Network for graph: C:\Users\LocalAdmin\ort_build\EPCtxModels\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\Qwen2.5-1.5B-Instruct_context_ov_dynamic_sym_bkp_int8_sym_r1\model\openvino_model.onnxException from src\inference\src\cpp\core.cpp:112:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\npuw\compiled_model.cpp:460:
Failed to compile Model0_FCEW000__0 for all devices in [NPU]

RyanMetcalfeInt8 · 2025-06-17T18:37:09Z

With this latest branch, I need to add the following to my genai_config.json:

"ep_context_file_path" : "C:/Users/LocalAdmin/ort_build/EPCtxModels/Phi-3.5-mini-instruct_context_ov_dynamic_sym_gs128_bkp_int8_sym/model/openvino_model.onnx",

This is unacceptable, as requiring this new option to be passed as an absolute path will become very problematic from a user experience / distribution perspective.

Can we still have it support the 'old way' (the method we used in msb_release_v2 branch) if ep_context_file_path session option is not specified?

ankitm3k · 2025-06-18T08:41:14Z

With this latest branch, I need to add the following to my genai_config.json:
"ep_context_file_path" : "C:/Users/LocalAdmin/ort_build/EPCtxModels/Phi-3.5-mini-instruct_context_ov_dynamic_sym_gs128_bkp_int8_sym/model/openvino_model.onnx",
This is unacceptable, as requiring this new option to be passed as an absolute path will become very problematic from a user experience / distribution perspective.

Can we still have it support the 'old way' (the method we used in msb_release_v2 branch) if ep_context_file_path session option is not specified?

fixed

RyanMetcalfeInt8 · 2025-06-18T17:20:05Z

Using this latest branch, I seem to get some failures for NPU when using ORT GenAI.

It seems to fail within the call to ExportCompiledBlobAsEPCtxNode (from here:

onnxruntime/onnxruntime/core/providers/openvino/backend_manager.cc

Line 204 in 56a0ce5

    
           auto status = onnxruntime::openvino_ep::BackendManager::ExportCompiledBlobAsEPCtxNode(subgraph);

)

Specifically, it fails for the compiled_model.export_model(blob_file); (from here:

onnxruntime/onnxruntime/core/providers/openvino/backend_manager.cc

Line 256 in 56a0ce5

compiled_model.export_model(blob_file);

)

Here is the exception:

2025-06-18 09:13:32.0906159 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2288 onnxruntime::InferenceSession::Initialize::<lambda_4aa6d47481a0927abf40a87f6c9c773d>::operator ()] Exception during initialization: Exception from src\inference\src\cpp\compiled_model.cpp:132:
Exception from src\plugins\intel_npu\src\plugin\npuw\serialization.cpp:164:
NPUW: Assertion false && "Unsupported type" failed


Traceback (most recent call last):
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\chat_sequence\chat_sequence_test.py", line 277, in <module>
    main(args)
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\chat_sequence\chat_sequence_test.py", line 115, in main
    chat = OrtGenaiChat(args, search_options)
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\common\ortgenai_chat.py", line 22, in __init__
    self.model = og.Model(config)
RuntimeError: Exception during initialization: Exception from src\inference\src\cpp\compiled_model.cpp:132:
Exception from src\plugins\intel_npu\src\plugin\npuw\serialization.cpp:164:
NPUW: Assertion false && "Unsupported type" failed

ankitm3k · 2025-06-18T18:47:18Z

Using this latest branch, I seem to get some failures for NPU when using ORT GenAI.

It seems to fail within the call to ExportCompiledBlobAsEPCtxNode (from here:

onnxruntime/onnxruntime/core/providers/openvino/backend_manager.cc

Line 204 in 56a0ce5

auto status = onnxruntime::openvino_ep::BackendManager::ExportCompiledBlobAsEPCtxNode(subgraph);

)
Specifically, it fails for the compiled_model.export_model(blob_file); (from here:

onnxruntime/onnxruntime/core/providers/openvino/backend_manager.cc

Line 256 in 56a0ce5

compiled_model.export_model(blob_file);

)
Here is the exception:
2025-06-18 09:13:32.0906159 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2288 onnxruntime::InferenceSession::Initialize::<lambda_4aa6d47481a0927abf40a87f6c9c773d>::operator ()] Exception during initialization: Exception from src\inference\src\cpp\compiled_model.cpp:132:
Exception from src\plugins\intel_npu\src\plugin\npuw\serialization.cpp:164:
NPUW: Assertion false && "Unsupported type" failed


Traceback (most recent call last):
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\chat_sequence\chat_sequence_test.py", line 277, in <module>
    main(args)
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\chat_sequence\chat_sequence_test.py", line 115, in main
    chat = OrtGenaiChat(args, search_options)
  File "C:\Users\LocalAdmin\Workspace\onnxruntime-genai-dev-tools\tests\common\ortgenai_chat.py", line 22, in __init__
    self.model = og.Model(config)
RuntimeError: Exception during initialization: Exception from src\inference\src\cpp\compiled_model.cpp:132:
Exception from src\plugins\intel_npu\src\plugin\npuw\serialization.cpp:164:
NPUW: Assertion false && "Unsupported type" failed

The precompiled blob export is functional for CPU/GPU plugins while NPU/NPUW plugins run into serialization issues as above which I believe OV toolkit team can fix it.

RyanMetcalfeInt8 · 2025-06-18T19:03:16Z

The precompiled blob export is functional for CPU/GPU plugins while NPU/NPUW plugins run into serialization issues as above which I believe OV toolkit team can fix it.

Yes, I agree that it's a feature that can be requested to be fixed by NPU/NPUW team. But I am running with OpenVINO 2025.2, which was just released today. So I think OV EP needs to be conscious of this limitation and skip the export for problematic cases (presumably when device==NPU and enable_causallm=True.

By the way, I'm a little bit confused why this function ExportCompiledBlobAsEPCtxNode is being called in my case, as ORT GenAI doesn't enable / set any kind of export features. Is it expected?

ankitm3k · 2025-06-18T19:38:28Z

The precompiled blob export is functional for CPU/GPU plugins while NPU/NPUW plugins run into serialization issues as above which I believe OV toolkit team can fix it.

Yes, I agree that it's a feature that can be requested to be fixed by NPU/NPUW team. But I am running with OpenVINO 2025.2, which was just released today. So I think OV EP needs to be conscious of this limitation and skip the export for problematic cases (presumably when device==NPU and enable_causallm=True.

By the way, I'm a little bit confused why this function ExportCompiledBlobAsEPCtxNode is being called in my case, as ORT GenAI doesn't enable / set any kind of export features. Is it expected?

Missed an edge case, fixed it. the export wont get triggered this time in the latest fix

RyanMetcalfeInt8 · 2025-06-18T23:26:42Z

Missed an edge case, fixed it. the export wont get triggered this time in the latest fix

Okay, with this version I am able to run ORT GenAI with NPU with models that we were testing msb_release_v2 with last month.

vthaniel · 2025-06-21T06:04:06Z

@ankitm3k
Can you please check the following issue
https://jira.devtools.intel.com/browse/CVS-169356

ankitm3k · 2025-06-23T15:34:36Z

@ankitm3k Can you please check the following issue https://jira.devtools.intel.com/browse/CVS-169356

its fixed now, we are merge ready

RyanMetcalfeInt8 · 2025-06-23T17:10:46Z

I don't think this is quite ready for merge yet. Running with latest branch here, I hit some exceptions when using ORT GenAI:

>python chat_sequence_test.py  --ortgenai_model_path C:\Users\LocalAdmin\ort_build\EPCtxModels\Phi-3.5-mini-instruct_context_ov_dynamic_sym_gs128_bkp_int8_asym_r1_noAWQ_wqe_noSE\model -r
Creating Model...
Creating Tokenizer...
Prompt (Use quit() to exit): hello
2025-06-23 09:08:11.9734765 [E:onnxruntime:onnxruntime-genai, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running OpenVINO-EP-subgraph_1 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0' Status Message: the ort_value must contain a constructed tensor or sparse tensor
An error occurred: Non-zero status code returned while running OpenVINO-EP-subgraph_1 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0' Status Message: the ort_value must contain a constructed tensor or sparse tensor

vthaniel · 2025-06-24T09:19:15Z

@ankitm3k Can you please check the following issue https://jira.devtools.intel.com/browse/CVS-169356

@ankitm3k @sfatimar
The issue is fixed now.
Unittests and Feature tests pass

RyanMetcalfeInt8

Approving this one, as it seems like ORT GenAI failures were caused by ovep-develop changes that were pulled into this branch during a rebase. @ankitm3k will raise a ticket to track that.

This reverts commit 6d04a2e.

* feat: Enable EpContext OVIR Encapsulation * fix: refactor EpCtx OVIR parsing logic to use ep.context_file_path * fix: Fix logic for parsing model_file_path * fix: enable EPCtx OVIR encapsulation compiled blob caching * fix: fix merge conflicts * fix: fix bugs

ankitm3k requested review from sfatimar, preetha-intel, vthaniel and jatinwadhwa921 June 11, 2025 12:54

sfatimar requested a review from MayureshV1 June 12, 2025 03:03

ankitm3k requested a review from RyanMetcalfeInt8 June 12, 2025 17:00

MayureshV1 requested a review from Copilot June 12, 2025 19:03

Copilot AI reviewed Jun 12, 2025

View reviewed changes

onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc Outdated Show resolved Hide resolved

RyanMetcalfeInt8 approved these changes Jun 12, 2025

View reviewed changes

MayureshV1 reviewed Jun 12, 2025

View reviewed changes

ankitm3k force-pushed the ankit/epctx_encaps_feature branch 2 times, most recently from c1aa179 to 1748e06 Compare June 16, 2025 07:51

RyanMetcalfeInt8 requested changes Jun 16, 2025

View reviewed changes

ankitm3k added 2 commits June 17, 2025 20:04

feat: Enable EpContext OVIR Encapsulation

89b6bd1

fix: refactor EpCtx OVIR parsing logic to use ep.context_file_path

143f4c1

ankitm3k force-pushed the ankit/epctx_encaps_feature branch from 1748e06 to 143f4c1 Compare June 17, 2025 14:50

ankitm3k force-pushed the ankit/epctx_encaps_feature branch from fdd5b0d to 0908804 Compare June 18, 2025 10:41

fix: Fix logic for parsing model_file_path

04cce1b

ankitm3k force-pushed the ankit/epctx_encaps_feature branch from 0908804 to 56a0ce5 Compare June 18, 2025 11:14

fix: enable EPCtx OVIR encapsulation compiled blob caching

9a77fcd

ankitm3k force-pushed the ankit/epctx_encaps_feature branch from 56a0ce5 to 9a77fcd Compare June 18, 2025 19:35

ankitm3k added 2 commits June 19, 2025 17:41

Merge branch 'ovep-develop' into ankit/epctx_encaps_feature

e5e87b2

fix: fix merge conflicts

e40331e

fix: fix bugs

01a26b7

ankitm3k force-pushed the ankit/epctx_encaps_feature branch from e74d3d6 to 01a26b7 Compare June 23, 2025 13:42

sfatimar approved these changes Jun 24, 2025

View reviewed changes

jatinwadhwa921 approved these changes Jun 24, 2025

View reviewed changes

RyanMetcalfeInt8 approved these changes Jun 24, 2025

View reviewed changes

sfatimar merged commit 6d04a2e into ovep-develop Jun 24, 2025
6 of 8 checks passed

gblong1 added a commit to gblong1/onnxruntime that referenced this pull request Jun 24, 2025

Revert "feat: Enable EpContext OVIR Encapsulation (intel#704)"

4f174bc

This reverts commit 6d04a2e.

RyanMetcalfeInt8 added a commit that referenced this pull request Jun 24, 2025

Revert "feat: Enable EpContext OVIR Encapsulation (#704)"

25700bb

This reverts commit 6d04a2e.

RyanMetcalfeInt8 mentioned this pull request Jun 24, 2025

Revert "feat: Enable EpContext OVIR Encapsulation" #718

Closed

feat: Enable EpContext OVIR Encapsulation #704

feat: Enable EpContext OVIR Encapsulation #704

Uh oh!

Conversation

ankitm3k commented Jun 11, 2025 • edited by sfatimar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

RyanMetcalfeInt8 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RyanMetcalfeInt8 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RyanMetcalfeInt8 commented Jun 17, 2025

Uh oh!

ankitm3k commented Jun 18, 2025

Uh oh!

RyanMetcalfeInt8 commented Jun 18, 2025

Uh oh!

ankitm3k commented Jun 18, 2025

Uh oh!

RyanMetcalfeInt8 commented Jun 18, 2025

Uh oh!

ankitm3k commented Jun 18, 2025

Uh oh!

RyanMetcalfeInt8 commented Jun 18, 2025

Uh oh!

vthaniel commented Jun 21, 2025

Uh oh!

ankitm3k commented Jun 23, 2025

Uh oh!

RyanMetcalfeInt8 commented Jun 23, 2025

Uh oh!

vthaniel commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RyanMetcalfeInt8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ankitm3k commented Jun 11, 2025 •

edited by sfatimar

Loading

RyanMetcalfeInt8 left a comment •

edited

Loading

RyanMetcalfeInt8 left a comment •

edited

Loading

vthaniel commented Jun 24, 2025 •

edited

Loading