feat: Add support to invoke PostResponse plugins #800

shmuelk · 2025-05-08T15:16:50Z

This PR adds the support to invoke the PostResponse scheduler plugins during the processing of response headers.

This PR with the PostResponse plugins invoked at response header processing time, enables developers to add headers to those that are sent to the client. This might include a session id or session token, usefull in session aware routing.

The changes included in this PR include:

It extends:
1. The scheduler with an OnResponse API, which in turn invokes a helper function to run the PostResponse plugins. This is done this way to make it easier to add future response handling plugins.
2. The dispatcher with a HandleResponse API, patterned after its existing HandleRequest API. This function creates a LLMResponse object and invokes the scheduler's OnResponse API.
The StreamingServer's response header handling has been refactored to:
1. Collect all of the response headers
2. Invoke dispather.HandleResponse
3. Invoke helper functions to build the Envoy gRPC ResponseHeaders response message.
A simple unit test has been added.

A more complex test that includes sending the gRPC messages will be added in a future PR that we plan to submit.

Note: This PR is dependent on PR #799

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

…gins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

netlify · 2025-05-08T15:16:55Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`fb774fe`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/6821e2df2350fa0008754eea
😎 Deploy Preview	https://deploy-preview-800--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

k8s-ci-robot · 2025-05-08T15:16:59Z

Hi @shmuelk. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

nirrozenbaum · 2025-05-08T16:47:41Z

pkg/epp/scheduling/types/types.go

+func NewSchedulingContext(ctx context.Context, req *LLMRequest, resp *LLMResponse, pods []Pod) *SchedulingContext {
+	var logger logr.Logger
+	if req != nil {
+		logger = log.FromContext(ctx).WithValues("request", req)
+	} else {
+		logger = log.FromContext(ctx).WithValues("response", resp)
+	}


does it makes sense to separate SchedulingRequestContext and SchedulingResponseContext?
for example, when using context for response, do you use PodSnapshot or only the selected pod?
maybe we could put TargetPod in ResponseContext instead of calculating PodsSnapshot which could be expensive in large scale scenario.
would be good if each gets only the data it uses and not super-object that contains all.

kfswain · 2025-05-10T13:42:23Z

Apologies for the delay!

I think there is some refactoring we could do here to make this implementation a little cleaner, but these were decisions made before this PR. This does move the needle forward, and actually wires up the PostResponse interface so it can be used. Nir made a point about not looping over the pods, since we already know the pod on the Request side, which I definitely agree with but won't make block this PR.

/lgtm
/approve
/hold

k8s-ci-robot · 2025-05-10T13:42:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, shmuelk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

nirrozenbaum · 2025-05-11T13:16:10Z

/ok-to-test

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

…headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

nirrozenbaum · 2025-05-12T12:32:53Z

/lgtm
/unhold

* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates to tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>

shmuelk added 10 commits May 8, 2025 15:32

Added the LLMResponse struct and RequestId to LLMRequest

33fb88e

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Updates due to NewSchedulerContext API change

d67c236

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Populate the RequestId field of LLMRequest

0a3a6ad

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Updates to tests

3e1aafc

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Added PostResponse plugins to scheduler config

b4d5836

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Added scheduler.OnResponse to handle responses

afc9b1b

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Added dispatcher.HandleResponse to handle responses

07ea43e

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Refactored server response header handling to invoke PostResponse plu…

e47b09f

…gins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Added simple test for PostResponse plugins

ac8349d

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

Setup the logger in the SchedulerContext appropriately for reponses

a8bba31

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2025

k8s-ci-robot requested review from Jeffwan and robscott May 8, 2025 15:16

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 8, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 8, 2025

nirrozenbaum reviewed May 8, 2025

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2025

k8s-ci-robot assigned kfswain May 10, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2025

kfswain mentioned this pull request May 10, 2025

Add LLMResponse object and RequestId to LLMRequest #799

Closed

Merge branch 'main' into post-response-header-main

db78c67

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 11, 2025

shmuelk and others added 5 commits May 12, 2025 12:31

Updates due to rebase issues

0ea51b2

merge functions in env utils (#819)

c6e07de

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

remove Model field from LLMRequest (#782)

50296f5

* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

Added the LLMResponse struct and RequestId to LLMRequest

3ce6ccf

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 12, 2025

Merge branch 'main' into post-response-header-main

25f0f7e

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 12, 2025

Insure that wanted response header messages have all of the response …

fb774fe

…headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2025

k8s-ci-robot assigned nirrozenbaum May 12, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2025

k8s-ci-robot merged commit 80ce385 into kubernetes-sigs:main May 12, 2025
7 of 8 checks passed

konflux-internal-p02 bot mentioned this pull request Jul 16, 2025

chore(deps): update dependency kubernetes-sigs/gateway-api-inference-extension to v0.4.0 red-hat-data-services/llm-d-inference-scheduler#30

Open

1 task

shmuelk mentioned this pull request Sep 25, 2025

REQUEST: New membership for shmuelk kubernetes/org#5864

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add support to invoke PostResponse plugins #800

feat: Add support to invoke PostResponse plugins #800

Uh oh!

shmuelk commented May 8, 2025 •

edited

Loading

Uh oh!

netlify bot commented May 8, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented May 8, 2025

Uh oh!

nirrozenbaum May 8, 2025 •

edited

Loading

Uh oh!

kfswain commented May 10, 2025

Uh oh!

k8s-ci-robot commented May 10, 2025

Uh oh!

nirrozenbaum commented May 11, 2025

Uh oh!

nirrozenbaum commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Add support to invoke PostResponse plugins #800

feat: Add support to invoke PostResponse plugins #800

Uh oh!

Conversation

shmuelk commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented May 8, 2025

Uh oh!

nirrozenbaum May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfswain commented May 10, 2025

Uh oh!

k8s-ci-robot commented May 10, 2025

Uh oh!

nirrozenbaum commented May 11, 2025

Uh oh!

nirrozenbaum commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shmuelk commented May 8, 2025 •

edited

Loading

netlify bot commented May 8, 2025 •

edited

Loading

nirrozenbaum May 8, 2025 •

edited

Loading