Skip to content

Conversation

kfswain
Copy link
Collaborator

@kfswain kfswain commented Jun 19, 2025

Fixes: #1001

This PR makes the InferenceModel an optional CRD. If no match is found, the matched model is set to the lowest criticality.

Currently this is hard-coded just to unblock: #1001

Future PRs will allow for configuration of the default (implemented with the implementation of InferenceSchedulingObjective: #1007

Copy link

netlify bot commented Jun 19, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 021aec3
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68546ec45838190008eb1456
😎 Deploy Preview https://deploy-preview-1024--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 19, 2025
@k8s-ci-robot k8s-ci-robot requested review from Jeffwan and liu-cong June 19, 2025 19:12
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 19, 2025
@kfswain kfswain force-pushed the inf-model-optional branch from 50d0abb to bbd6b8b Compare June 19, 2025 19:34
Comment on lines 96 to 102
logger.Info("No associated inferenceModel found, using default", "Requested Model", reqCtx.Model)
sheddable := v1alpha2.Sheddable
modelObj = &v1alpha2.InferenceModel{
Spec: v1alpha2.InferenceModelSpec{
ModelName: reqCtx.Model,
Criticality: &sheddable,
},
Copy link
Contributor

@nirrozenbaum nirrozenbaum Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was exactly writing the question "do we need TargetModels" and you removed it :).

so just to verify I got it right - if the requested modelName is not found in datastore, we just passthrough and leave the modelName field as is with lowest criticality.
since it's not found in the datastore we already know it's going to fail, right? no pod has this model.
also in some situations the request might be dropped cause it's a sheddable request.

I understand this is done for conformance, but what exactly are we trying to test here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenience, the model might be available on the model server, but a user doesnt want to set up an InferenceModel for it.

So the req might not necessarily fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it 👍

Copy link
Contributor

@nirrozenbaum nirrozenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

left one nit.
/hold if you want to address the comment.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nirrozenbaum
Copy link
Contributor

/lgtm

left one nit. /hold if you want to address the comment.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2025
@kfswain kfswain force-pushed the inf-model-optional branch from bbd6b8b to 021aec3 Compare June 19, 2025 20:10
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2025
@nirrozenbaum
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2025
@nirrozenbaum
Copy link
Contributor

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2025
@k8s-ci-robot k8s-ci-robot merged commit 4433ad5 into kubernetes-sigs:main Jun 19, 2025
9 checks passed
if modelObj == nil {
return reqCtx, errutil.Error{Code: errutil.BadConfiguration, Msg: fmt.Sprintf("error finding a model object in InferenceModel for input %v", reqCtx.Model)}
logger.Info("No associated inferenceModel found, using default", "model", reqCtx.Model)
sheddable := v1alpha2.Sheddable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should probably be standard, not sheddable.

rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 26, 2025
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 26, 2025
EyalPazz pushed a commit to EyalPazz/gateway-api-inference-extension that referenced this pull request Jul 9, 2025
@kfswain kfswain deleted the inf-model-optional branch July 31, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor EPP to decouple core logic from InferenceModel resource
4 participants