Skip to content

Conversation

shmuelk
Copy link
Contributor

@shmuelk shmuelk commented May 8, 2025

This PR adds the support to invoke the PostResponse scheduler plugins during the processing of response headers.

This PR with the PostResponse plugins invoked at response header processing time, enables developers to add headers to those that are sent to the client. This might include a session id or session token, usefull in session aware routing.

The changes included in this PR include:

  1. It extends:
    1. The scheduler with an OnResponse API, which in turn invokes a helper function to run the PostResponse plugins. This is done this way to make it easier to add future response handling plugins.
    2. The dispatcher with a HandleResponse API, patterned after its existing HandleRequest API. This function creates a LLMResponse object and invokes the scheduler's OnResponse API.
  2. The StreamingServer's response header handling has been refactored to:
    1. Collect all of the response headers
    2. Invoke dispather.HandleResponse
    3. Invoke helper functions to build the Envoy gRPC ResponseHeaders response message.
  3. A simple unit test has been added.

A more complex test that includes sending the gRPC messages will be added in a future PR that we plan to submit.

Note: This PR is dependent on PR #799

shmuelk added 10 commits May 8, 2025 15:32
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
…gins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Copy link

netlify bot commented May 8, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit fb774fe
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/6821e2df2350fa0008754eea
😎 Deploy Preview https://deploy-preview-800--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2025
@k8s-ci-robot k8s-ci-robot requested review from Jeffwan and robscott May 8, 2025 15:16
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 8, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @shmuelk. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 8, 2025
Comment on lines 104 to 110
func NewSchedulingContext(ctx context.Context, req *LLMRequest, resp *LLMResponse, pods []Pod) *SchedulingContext {
var logger logr.Logger
if req != nil {
logger = log.FromContext(ctx).WithValues("request", req)
} else {
logger = log.FromContext(ctx).WithValues("response", resp)
}
Copy link
Contributor

@nirrozenbaum nirrozenbaum May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it makes sense to separate SchedulingRequestContext and SchedulingResponseContext?
for example, when using context for response, do you use PodSnapshot or only the selected pod?
maybe we could put TargetPod in ResponseContext instead of calculating PodsSnapshot which could be expensive in large scale scenario.
would be good if each gets only the data it uses and not super-object that contains all.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2025
@kfswain
Copy link
Collaborator

kfswain commented May 10, 2025

Apologies for the delay!

I think there is some refactoring we could do here to make this implementation a little cleaner, but these were decisions made before this PR. This does move the needle forward, and actually wires up the PostResponse interface so it can be used. Nir made a point about not looping over the pods, since we already know the pod on the Request side, which I definitely agree with but won't make block this PR.

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, shmuelk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2025
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 11, 2025
@nirrozenbaum
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 11, 2025
shmuelk and others added 5 commits May 12, 2025 12:31
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 12, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 12, 2025
…headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
@nirrozenbaum
Copy link
Contributor

/lgtm
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2025
@k8s-ci-robot k8s-ci-robot merged commit 80ce385 into kubernetes-sigs:main May 12, 2025
7 of 8 checks passed
nayihz pushed a commit to nayihz/gateway-api-inference-extension that referenced this pull request May 14, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
kaushikmitr pushed a commit to kaushikmitr/llm-instance-gateway that referenced this pull request May 15, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
irar2 pushed a commit to irar2/gateway-api-inference-extension that referenced this pull request Jun 3, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants