Skip to content

feat: ORT GenAI Stateful Compilation changes #676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 5, 2025

Conversation

ankitm3k
Copy link

@ankitm3k ankitm3k commented Apr 25, 2025

Description

This PR enables the essential features to enable ORT GenAI with OVEP using Stateful Compilation of ov::Model, inspired from OV GenAI pipeline flow.

I have introduced a new provider option enable_causallm which can be set to True for enabling the ORT GenAI pipeline with Causal LLM Models that are fully supported on OVEP in the custom config file called genai_config.json

Sample genai_config.json -

"provider_options": [
    {
         // Key "OpenVINO" is case sensitive and must be used as below as its defined by MSFT
         "OpenVINO":
          {
              "device_type": "NPU",
              // Mandatory provider option to be set always with ORT GenAI
              "enable_causallm" : "True",
              // (Applicable for NPU only) Optional setting for compilation with custom MAX_PROMPT_LEN & MIN_RESPONSE_LEN
              "load_config": "{\"NPU\":{\"MAX_PROMPT_LEN\":\"2048\",\"MIN_RESPONSE_LEN\":\"512\"}}"                            
                       
           }
    }
    ]

FYI the GenAI models in ONNX format are usually Stateless in nature & require dynamic shapes compilation.

@ankitm3k ankitm3k requested review from sfatimar and vthaniel April 25, 2025 07:53
@ankitm3k ankitm3k self-assigned this Apr 25, 2025
@ankitm3k ankitm3k requested a review from jatinwadhwa921 April 25, 2025 09:06
@ankitm3k ankitm3k force-pushed the ort_genai_features branch 3 times, most recently from 46802e5 to f137407 Compare May 19, 2025 08:11
@ankitm3k ankitm3k force-pushed the ort_genai_features branch from f137407 to 8a9fca1 Compare May 20, 2025 07:04
@ankitm3k ankitm3k force-pushed the ort_genai_features branch 2 times, most recently from b672820 to 775c27e Compare June 2, 2025 06:19
@@ -106,7 +106,8 @@ BackendManager::BackendManager(SessionContext& session_context,
subgraph_context_.has_dynamic_input_shape = true;
LOGS_DEFAULT(INFO) << "[OpenVINO-EP] Model has symbolic input dims";
if ((session_context_.device_type.find("CPU") != std::string::npos ||
session_context_.device_type.find("GPU") != std::string::npos) &&
session_context_.device_type.find("GPU") != std::string::npos ||
(session_context_.device_type.find("NPU") != std::string::npos && session_context_.enable_causallm)) &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this condition be simplified Since this is not valid only for a dynamic model on NPU

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@MayureshV1 MayureshV1 requested a review from Copilot June 3, 2025 23:58
Copilot

This comment was marked as outdated.

@ankitm3k ankitm3k requested a review from Copilot June 4, 2025 12:20
Copilot

This comment was marked as outdated.

@ankitm3k ankitm3k force-pushed the ort_genai_features branch 2 times, most recently from 845f903 to 833cff9 Compare June 4, 2025 13:02
ovInfReq.set_tensor(tensor_name, tensor);
}

void StatefulOVInferRequest::CacheTensor(const std::string& tensor_name, std::vector<int64_t>& cache) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we may eventually need to support caching position_ids / logits which have more complicated shapes than just [1, <num_input_tokens>].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will handle that logic in the next PR

Copy link

@preetha-intel preetha-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ankitm3k ankitm3k force-pushed the ort_genai_features branch from 97775fd to a5ac79d Compare June 5, 2025 10:00
@ankitm3k ankitm3k force-pushed the ort_genai_features branch from a5ac79d to a9b1f9d Compare June 5, 2025 10:47
@ankitm3k ankitm3k requested a review from Copilot June 5, 2025 12:01
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces stateful compilation support for ORT GenAI using OpenVINO by integrating a new provider option, enable_causallm, along with several supporting changes in compilation, inference request handling, and backend communication. Key changes include adding stateful model transformation utilities, updating the OpenVINO interface to support causal LM functionality, and modifying test cases and backend management to incorporate KV cache rewind and dynamic shapes handling.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

File Description
onnxruntime/test/perftest/ort_test_session.cc Extended option parsing for enable_causallm with boolean value checking and error messaging
onnxruntime/core/providers/openvino/ov_interface.{h,cc} Added StatefulCompileModel mechanism and updated OVExeNetwork to carry extra stateful attributes
onnxruntime/core/providers/openvino/openvino_provider_factory.cc Integrated parsing logic for enable_causallm and adjusted dynamic shapes flags for NPU devices
onnxruntime/* (multiple backend and execution provider files) Updated backend and infer request handling to support KV cache operations, stateful inference, and additional configuration for ORT GenAI
Comments suppressed due to low confidence (2)

onnxruntime/core/providers/openvino/backends/basic_backend.h:54

  • It would be helpful to add a comment explaining why tensor names such as ''beam_idx'', ''past_key_values'', and ''present'' are being skipped when session_context.enable_causallm is true. This aids future maintainers in understanding the rationale behind bypassing KV cache tensor mapping in stateful model scenarios.
if ((onnx_name.empty() || onnx_name == "beam_idx" ||

onnxruntime/core/providers/openvino/ov_interface.h:90

  • [nitpick] Consider renaming the member variable 'compiled_model_obj' to simply 'compiled_model' for improved clarity and consistency with standard naming conventions.
ov::CompiledModel compiled_model_obj;

@ankitm3k ankitm3k merged commit 660adfc into ovep-develop Jun 5, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants