Skip to content

Conversation

nirrozenbaum
Copy link
Contributor

@nirrozenbaum nirrozenbaum commented May 5, 2025

This PR removes the Model field from Scheduling.LLMRequest struct.
from the scheduler point of view, it doesn't care about the original requested model name, only about the resolved model after traffic splitting that was done in a higher level.
we can see that Model field was used in unit-tests only and that it's always set to be identical to ResolvedTargetModel. Scheduler doesn't use this field.

In addition to removing Model field from LLMRequest struct, this PR renames ResolvedTargetModel to TargetModel in LLMRequest from the same reasons. scheduler plugins don't care about traffic splitting and "resolved" model, only about what is target model.

unit-tests and other usage places were updated accordingly.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 5, 2025
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and robscott May 5, 2025 06:06
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 5, 2025
@nirrozenbaum
Copy link
Contributor Author

cc @kfswain

Copy link

netlify bot commented May 5, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 0d00997
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/6820d800237f460008e56177
😎 Deploy Preview https://deploy-preview-782--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@LukeAVanDrie
Copy link
Contributor

LukeAVanDrie commented May 5, 2025

I am working on the Flow Controller POC. We currently enforce the dispatch policy (for fairness) by model, not resolved target model. I will either need to use a different type when enqueuing a request than LLMRequest or will need model to remain as a field on this struct.

The Flow Controller is not tightly coupled to the Scheduler, so either approach is fine.

@nirrozenbaum
Copy link
Contributor Author

I am working on the Flow Controller POC. We currently enforce the dispatch policy (for fairness) by model, not resolved target model. I will either need to use a different type when enqueuing a request than LLMRequest or will need model to remain as a field on this struct.

The Flow Controller is not tightly coupled to the Scheduler, so either approach is fine.

@LukeAVanDrie thanks for bringing this up.
can you elaborate on what other fields you have in Flow Controller? (do you have a link to your git branch?)
I mean, do you have a subset of the fields that are used in both? do you have additional fields that are not used in Scheduler and you need in Flow Control?

Ideally, each layer should get the data it needs for its mission, and only that.
if there is a complete match (with the exception of Model field) then sure, let's leave it as is.
but I suspect you might need only parts of the fields or additional fields

@kfswain
Copy link
Collaborator

kfswain commented May 5, 2025

but I suspect you might need only parts of the fields or additional fields

To my understanding, @LukeAVanDrie needs the Model name, as that is the unique identifier we currently use to separate one 'use case' from another.

Ideally, each layer should get the data it needs for its mission, and only that.
Completely agreed here. That is the endstate

@LukeAVanDrie
Copy link
Contributor

@LukeAVanDrie thanks for bringing this up. can you elaborate on what other fields you have in Flow Controller? (do you have a link to your git branch?) I mean, do you have a subset of the fields that are used in both? do you have additional fields that are not used in Scheduler and you need in Flow Control?

A subset (model, criticality) and some new fields: a reference to the request prompt size in bytes and a reference to the request context.

Ideally, each layer should get the data it needs for its mission, and only that. if there is a complete match (with the exception of Model field) then sure, let's leave it as is. but I suspect you might need only parts of the fields or additional fields

This is a good point. I have removed my dependency on scheduling.LLMRequest and am defining my own type for the FlowController input. My POC's fate is no longer tied to this PR.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 6, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2025
@nirrozenbaum
Copy link
Contributor Author

ready for review after #781 was merged
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 8, 2025
if reqCtx.ResolvedTargetModel == "" {
return reqCtx, errutil.Error{Code: errutil.BadConfiguration, Msg: fmt.Sprintf("error getting target model name for model %v", modelObj.Name)}
}
reqCtx.Request.Body["model"] = reqCtx.ResolvedTargetModel // Update target model in the body.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was moved from PostDispatch to here.
in this point the resolution of the target model is done, it seems to belong here more naturally.

@nirrozenbaum nirrozenbaum requested a review from kfswain May 8, 2025 07:47
@danehans
Copy link
Contributor

danehans commented May 8, 2025

@nirrozenbaum needs a rebase.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2025
Copy link
Contributor

@danehans danehans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 8, 2025
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 9, 2025
@nirrozenbaum nirrozenbaum requested a review from danehans May 9, 2025 07:21
@ahg-g
Copy link
Contributor

ahg-g commented May 11, 2025

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2025
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2025
@ahg-g
Copy link
Contributor

ahg-g commented May 11, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2025
@k8s-ci-robot k8s-ci-robot merged commit 8df511a into kubernetes-sigs:main May 11, 2025
8 checks passed
@nirrozenbaum nirrozenbaum deleted the rm-model branch May 11, 2025 17:34
k8s-ci-robot pushed a commit that referenced this pull request May 12, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
nayihz pushed a commit to nayihz/gateway-api-inference-extension that referenced this pull request May 14, 2025
* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
nayihz pushed a commit to nayihz/gateway-api-inference-extension that referenced this pull request May 14, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
kaushikmitr pushed a commit to kaushikmitr/llm-instance-gateway that referenced this pull request May 15, 2025
* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
kaushikmitr pushed a commit to kaushikmitr/llm-instance-gateway that referenced this pull request May 15, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
irar2 pushed a commit to irar2/gateway-api-inference-extension that referenced this pull request Jun 3, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates to tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* typo

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make linter happy

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rebase handling

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants