-
Notifications
You must be signed in to change notification settings - Fork 179
Making inferenceModel optional #1024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
50d0abb
to
bbd6b8b
Compare
pkg/epp/requestcontrol/director.go
Outdated
logger.Info("No associated inferenceModel found, using default", "Requested Model", reqCtx.Model) | ||
sheddable := v1alpha2.Sheddable | ||
modelObj = &v1alpha2.InferenceModel{ | ||
Spec: v1alpha2.InferenceModelSpec{ | ||
ModelName: reqCtx.Model, | ||
Criticality: &sheddable, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was exactly writing the question "do we need TargetModels" and you removed it :).
so just to verify I got it right - if the requested modelName is not found in datastore, we just passthrough and leave the modelName field as is with lowest criticality.
since it's not found in the datastore we already know it's going to fail, right? no pod has this model.
also in some situations the request might be dropped cause it's a sheddable request.
I understand this is done for conformance, but what exactly are we trying to test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenience, the model might be available on the model server, but a user doesnt want to set up an InferenceModel for it.
So the req might not necessarily fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
left one nit.
/hold if you want to address the comment.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain, nirrozenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
bbd6b8b
to
021aec3
Compare
/lgtm |
/unhold |
if modelObj == nil { | ||
return reqCtx, errutil.Error{Code: errutil.BadConfiguration, Msg: fmt.Sprintf("error finding a model object in InferenceModel for input %v", reqCtx.Model)} | ||
logger.Info("No associated inferenceModel found, using default", "model", reqCtx.Model) | ||
sheddable := v1alpha2.Sheddable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should probably be standard, not sheddable.
Fixes: #1001
This PR makes the InferenceModel an optional CRD. If no match is found, the matched model is set to the lowest criticality.
Currently this is hard-coded just to unblock: #1001
Future PRs will allow for configuration of the default (implemented with the implementation of
InferenceSchedulingObjective
: #1007