Add LLMResponse object and RequestId to LLMRequest #799

shmuelk · 2025-05-08T12:37:07Z

This PR adds a LLMResponse struct, similar in idea to the LLMRequest struct for use in response handling. It will be used in a future PR that adds PostResponse plugin support.

The LLMResponse struct contains the request Id, the received response headers, the body received, a flag indicating whether or not the model is streaming results, and a flag indicating whether or not all of the response body has been received. These latter three fields will be used in the future when there is PostResponse processing of the response body.

In a future PR:

The current response header processing in StreamingServer will be refactored to collect the headers and invoke a new Dispatcher API HandleResponse
The new Dispatcher HandleResponse API will invoke the new Scheduler OnResponse API, which ultimately invokes the scheduler PostResponse plugins.

The new Scheduler OnResponse API receives a LLMResponse struct in a way similar to the way scheduler Schedule gets a LLMRequest object.

This PR is laying foundations for the future PR that adds the PostResponse support.

In addition this PR adds a RequestId to the LLMRequest struct to allow plugins to cache internally information specific to a particular request.

This is the first PR in a series to add support for PostResponse

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

netlify · 2025-05-08T12:37:13Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`3e1aafc`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/681ca575e9e4750008a56bd9
😎 Deploy Preview	https://deploy-preview-799--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

k8s-ci-robot · 2025-05-08T12:37:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shmuelk
Once this PR has been reviewed and has the lgtm label, please assign jeffwan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-05-08T12:37:17Z

Hi @shmuelk. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

nirrozenbaum · 2025-05-08T16:44:01Z

/ok-to-test

k8s-ci-robot · 2025-05-09T07:31:21Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kfswain · 2025-05-09T15:56:03Z

pkg/epp/scheduling/types/types.go

 }

-func NewSchedulingContext(ctx context.Context, req *LLMRequest, pods []Pod) *SchedulingContext {
+func NewSchedulingContext(ctx context.Context, req *LLMRequest, resp *LLMResponse, pods []Pod) *SchedulingContext {


This param is nil in all the places it's called. It's also a exported field, perhaps the New func should just initialize the LLMResponse struct? And any consumer can populate or replace the field as needed.

(Completely out of scope of this PR, but I would like to move away from LLM naming, a lot of this code can be generalized to Inference, LLMs are just the current focus)

kfswain · 2025-05-09T15:59:50Z

pkg/epp/scheduling/types/types.go

+// LLMResponse contains information from the response received to be passed to plugins
+type LLMResponse struct {
+	// RequestId is the Envoy generated Id for the request being processed
+	RequestId string


I'm on the fence if there should be a dedicated field for this, we could put this in the Headers map, or check to see if it's already there, and then populate.

kfswain · 2025-05-10T13:45:02Z

#800 is a super set of this PR. I absolutely appreciate cutting smaller PRs, but just the way this specific instance worked out. I was able to review #800 just fine. Thank you for making small, review friendly PRs! I will close this one in favor of #800 however

shmuelk added 4 commits May 8, 2025 15:32

Added the LLMResponse struct and RequestId to LLMRequest

33fb88e

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Updates due to NewSchedulerContext API change

d67c236

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Populate the RequestId field of LLMRequest

0a3a6ad

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Updates to tests

3e1aafc

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2025

k8s-ci-robot requested review from nirrozenbaum and robscott May 8, 2025 12:37

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 8, 2025

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 8, 2025

shmuelk mentioned this pull request May 8, 2025

feat: Add support to invoke PostResponse plugins #800

Merged

kfswain reviewed May 9, 2025

View reviewed changes

kfswain closed this May 10, 2025

shmuelk deleted the llm-response-main branch May 15, 2025 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLMResponse object and RequestId to LLMRequest #799

Add LLMResponse object and RequestId to LLMRequest #799

Uh oh!

shmuelk commented May 8, 2025 •

edited

Loading

Uh oh!

netlify bot commented May 8, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented May 8, 2025

Uh oh!

k8s-ci-robot commented May 8, 2025

Uh oh!

nirrozenbaum commented May 8, 2025

Uh oh!

k8s-ci-robot commented May 9, 2025

Uh oh!

kfswain May 9, 2025

Uh oh!

kfswain May 9, 2025

Uh oh!

kfswain commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add LLMResponse object and RequestId to LLMRequest #799

Add LLMResponse object and RequestId to LLMRequest #799

Uh oh!

Conversation

shmuelk commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented May 8, 2025

Uh oh!

k8s-ci-robot commented May 8, 2025

Uh oh!

nirrozenbaum commented May 8, 2025

Uh oh!

k8s-ci-robot commented May 9, 2025

Uh oh!

kfswain May 9, 2025

Choose a reason for hiding this comment

Uh oh!

kfswain May 9, 2025

Choose a reason for hiding this comment

Uh oh!

kfswain commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shmuelk commented May 8, 2025 •

edited

Loading

netlify bot commented May 8, 2025 •

edited

Loading