Skip to content

Conversation

shmuelk
Copy link
Contributor

@shmuelk shmuelk commented May 8, 2025

This PR adds a LLMResponse struct, similar in idea to the LLMRequest struct for use in response handling. It will be used in a future PR that adds PostResponse plugin support.

The LLMResponse struct contains the request Id, the received response headers, the body received, a flag indicating whether or not the model is streaming results, and a flag indicating whether or not all of the response body has been received. These latter three fields will be used in the future when there is PostResponse processing of the response body.

In a future PR:

  1. The current response header processing in StreamingServer will be refactored to collect the headers and invoke a new Dispatcher API HandleResponse
  2. The new Dispatcher HandleResponse API will invoke the new Scheduler OnResponse API, which ultimately invokes the scheduler PostResponse plugins.

The new Scheduler OnResponse API receives a LLMResponse struct in a way similar to the way scheduler Schedule gets a LLMRequest object.

This PR is laying foundations for the future PR that adds the PostResponse support.

In addition this PR adds a RequestId to the LLMRequest struct to allow plugins to cache internally information specific to a particular request.

This is the first PR in a series to add support for PostResponse

shmuelk added 4 commits May 8, 2025 15:32
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Copy link

netlify bot commented May 8, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 3e1aafc
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/681ca575e9e4750008a56bd9
😎 Deploy Preview https://deploy-preview-799--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shmuelk
Once this PR has been reviewed and has the lgtm label, please assign jeffwan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 8, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @shmuelk. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 8, 2025
@nirrozenbaum
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

}

func NewSchedulingContext(ctx context.Context, req *LLMRequest, pods []Pod) *SchedulingContext {
func NewSchedulingContext(ctx context.Context, req *LLMRequest, resp *LLMResponse, pods []Pod) *SchedulingContext {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This param is nil in all the places it's called. It's also a exported field, perhaps the New func should just initialize the LLMResponse struct? And any consumer can populate or replace the field as needed.

(Completely out of scope of this PR, but I would like to move away from LLM naming, a lot of this code can be generalized to Inference, LLMs are just the current focus)

// LLMResponse contains information from the response received to be passed to plugins
type LLMResponse struct {
// RequestId is the Envoy generated Id for the request being processed
RequestId string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence if there should be a dedicated field for this, we could put this in the Headers map, or check to see if it's already there, and then populate.

@kfswain
Copy link
Collaborator

kfswain commented May 10, 2025

#800 is a super set of this PR. I absolutely appreciate cutting smaller PRs, but just the way this specific instance worked out. I was able to review #800 just fine. Thank you for making small, review friendly PRs! I will close this one in favor of #800 however

@kfswain kfswain closed this May 10, 2025
@shmuelk shmuelk deleted the llm-response-main branch May 15, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants