-
Notifications
You must be signed in to change notification settings - Fork 180
remove Model field from LLMRequest #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @kfswain |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
I am working on the Flow Controller POC. We currently enforce the dispatch policy (for fairness) by model, not resolved target model. I will either need to use a different type when enqueuing a request than LLMRequest or will need model to remain as a field on this struct. The Flow Controller is not tightly coupled to the Scheduler, so either approach is fine. |
@LukeAVanDrie thanks for bringing this up. Ideally, each layer should get the data it needs for its mission, and only that. |
To my understanding, @LukeAVanDrie needs the
|
A subset (model, criticality) and some new fields: a reference to the request prompt size in bytes and a reference to the request context.
This is a good point. I have removed my dependency on scheduling.LLMRequest and am defining my own type for the FlowController input. My POC's fate is no longer tied to this PR. |
ready for review after #781 was merged |
if reqCtx.ResolvedTargetModel == "" { | ||
return reqCtx, errutil.Error{Code: errutil.BadConfiguration, Msg: fmt.Sprintf("error getting target model name for model %v", modelObj.Name)} | ||
} | ||
reqCtx.Request.Body["model"] = reqCtx.ResolvedTargetModel // Update target model in the body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was moved from PostDispatch to here.
in this point the resolution of the target model is done, it seems to belong here more naturally.
@nirrozenbaum needs a rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, nirrozenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
/lgtm |
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates to tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to rebase issues * merge functions in env utils (#819) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * generalize scheduling cycle state concept (#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * remove Model field from LLMRequest (#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates to tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates to tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates to tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates to tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
This PR removes the
Model
field fromScheduling.LLMRequest
struct.from the scheduler point of view, it doesn't care about the original requested model name, only about the resolved model after traffic splitting that was done in a higher level.
we can see that
Model
field was used in unit-tests only and that it's always set to be identical to ResolvedTargetModel. Scheduler doesn't use this field.In addition to removing
Model
field fromLLMRequest
struct, this PR renamesResolvedTargetModel
toTargetModel
inLLMRequest
from the same reasons. scheduler plugins don't care about traffic splitting and "resolved" model, only about what is target model.unit-tests and other usage places were updated accordingly.