Proposing the successor to the InferenceModel API #1199

kfswain · 2025-07-21T21:33:51Z

Please consider : https://docs.google.com/document/d/1x6aI9pbTF5oOsaEQYc9n4pBBY3_AuEY2X51VKxmBSnU/edit?tab=t.0#heading=h.towq7jyczzgo & https://docs.google.com/document/d/1G-CQ17CM4j1vNE3T6u9uP2q-m6jK14ANPCwTfJ2qLS4/edit?tab=t.0 required prior reading.

netlify · 2025-07-21T21:33:56Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`c9573fc`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6883d76579dadd0008df01c1
😎 Deploy Preview	https://deploy-preview-1199--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-07-21T21:33:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

docs/proposals/1199-multitenant-api-proposal/README.md

ahg-g · 2025-07-21T22:01:38Z

docs/proposals/1199-multitenant-api-proposal/README.md

+The only field associated with identification is the `ID` field. A unique ID field was chosen (rather than using the metadata name), because:
+- We do not want to put the same restrictions on the string that is enfored on a kube resource name
+- The ID name may be duplicated across different pools
+- Use of a kube-generated name would force an upstream Auth mechanism to be aware of the `InferenceObjectives` API


This is not a kube-generated name, it is a name chosen by the admin when creating the object, so I am not sure this is an issue, in fact using the object name has the clear advantage of enforcing uniqueness.

That might not be an advantage, if someone wants to use the same auth 'identity' string in multiple pools in the same namespace, they wouldn't be able to do so with the current implementation. Just depends on usage, but we may not want to inject this opinion on the users behalf

docs/proposals/1199-multitenant-api-proposal/README.md

elevran

overall looks good and an improvement over InferenceModel.
Left some comments and requests for clarification where needed.

elevran · 2025-07-23T18:27:47Z

docs/proposals/1199-multitenant-api-proposal/README.md

+- Anonymous identity/defaults are graceful (fault-tolerant) & unsurprising
+- Scalable, simple, and reusable config
+- Retain the functionality of InferenceModel
+  - Traffic splitting models & modelName rewrite


Based on discussion in community meeting, my understanding was the traffic splitting is not in scope of the inference scheduler and should be delegated to GW API, but perhaps I misunderstood. L115 seems to be in support of pushing this out of GIE, but perhaps the above is meant to indicate that traffic splitting is still in scope though not necessarily via the new CRD?

Retain functionality in the sense that: it will still be possible we arent just dropping the feature entireley, but as you mention that functionality will be achieved through alternate means.

Open to different wording here.

Here is the concrete proposal of how this will be done, @kfswain perhaps you can add a link to it or copy from it, as you see fit: https://docs.google.com/document/d/1s4U4T_cjQkk4UeIDyAJl2Ox6FZoBigXBXn9Ai0qV7As/edit?tab=t.0

Open to different wording here.

- Continue to support traffic splitting across models, although not necessarily via GIE CRDs directly (e.g., delegated to GW API).

?

elevran · 2025-07-23T18:31:03Z

docs/proposals/1199-multitenant-api-proposal/README.md

+```
+
+### Other changes
+- The EPP will expose a flag to define the header key that will be used to assign InferenceObjectives to


Q: is the intent for this be added to the config file or command line flag?
Also, does the sentence cut short? ... to assign InferenceObjectives to seems to be missing a targer.

elevran · 2025-07-23T18:34:51Z

docs/proposals/1199-multitenant-api-proposal/README.md

+	//
+	// +optional
+	// +kubebuilder:default="inference.networking.x-k8s.io"
+	Group Group `json:"group,omitempty"`


Q: are Group and Kind needed?
Aren't they fixed since always referring to an InferencePool?

My thinking is that including everything in the spec is a good idea, because the defaults allow G&K to be treated as optional, but if someone were to ever extend this and want a different G&K it would hopefully be slightly easier to use. But in practice, yeah I agree. It just seems to be standard to include the whole thing, from what I've seen.

elevran · 2025-07-23T18:37:32Z

docs/proposals/1199-multitenant-api-proposal/README.md

+## Phase 2
+
+### CRD spec
+```golang


indentation across structs seems inconsistent

elevran · 2025-07-23T18:39:47Z

docs/proposals/1199-multitenant-api-proposal/README.md

+
+type InferenceObjectivesSpec struct {
+	// Scope: Identifier
+    ID string


Clarification (also based on L201): this is the ID value set in request header which indicates the request is part of the category (tenant/flow/identity) to which the objective applies?

elevran · 2025-07-23T18:42:45Z

docs/proposals/1199-multitenant-api-proposal/README.md

+
+### Pool Binding
+
+Tackling the simplest first; the single Pool Binding field `PoolRef` is simply to associate a given InferenceObjectives object with a pool. Meaning that InferenceObjectives with duplicate ID's across different pools are considered valid.


the objectives would be duplicated with the exception of the pool, correct?
Should we consider making the pool reference an array to avoid the repetition and ensure that policies which are meant to be identical across multiple pools in the namespace do not diverge?

elevran · 2025-07-23T18:47:10Z

docs/proposals/1199-multitenant-api-proposal/README.md

+
+This ID can be any string; in the case of a MaaS platform, it may identify a customer via a one-way hashed JWT. Or for sharing an InferencePool between teams, it may be a team-id. 
+
+***Important***: In order to support a high volume of tenants, by default IGW will _allow_ unique IDs that do not have an explicit InferenceObjectives object defined. Instead the default for all policies will be used.


Unclear on first read. What is the "default for all policies" that will be in use?
For requests having an ID which is not matched, where/how is the default policy defined?

Also, do we differentiate between no identity (e.g., missing header, perhaps implying "anonymous"?) and a request with identity that does not have a corresponding matching ID in any policy (e.g., apply default pool policies to it)?

elevran · 2025-07-23T18:48:50Z

docs/proposals/1199-multitenant-api-proposal/README.md

+***Important***: In order to support a high volume of tenants, by default IGW will _allow_ unique IDs that do not have an explicit InferenceObjectives object defined. Instead the default for all policies will be used.
+
+#### Alternative consideration(s)
+- Expanding the PoolRef field to be plural was considered, however that was not selected to maintain simplicity. It is a decision that can be revisited in the future, however.


See earlier comment. Consider a use case for policies that span multiple pools and should remain consistent.
What the downside of plural PoolRef?

WRT multi-pool reference, I remember there being issues with it, but I don't remember the exact reasoning.

I'm happy to hash it out in more detail in the community meeting tomorrow, I think the issues may have been Gateway related.

kfswain · 2025-07-24T16:31:43Z

Note: made some substantial changes to this proposal in the latest push, it may not read well, will do another proof read myself in a bit.

ahg-g · 2025-07-25T17:19:55Z

docs/proposals/1199-inferencemodel-api-evolution/README.md

+
+As such, the InferenceModel API will be split into separate CRDs to reflect the difference in these scopes. Phase 1 will focus on the **Request specific objectives**. Specifically it will maintain the inclusion of criticality. Other phase 1 changes:
+
+- The EPP will expose a flag to define the header key that will be used to assign InferenceObjectives to


can we define the header key for both selecting the inferenceObjective and the flow id?

and also the header for the model name re-write

ahg-g · 2025-07-25T18:45:38Z

docs/proposals/1199-inferencemodel-api-evolution/README.md

+As such, the InferenceModel API will be split into separate CRDs to reflect the difference in these scopes. Phase 1 will focus on the **Request specific objectives**. Specifically it will maintain the inclusion of criticality. Other phase 1 changes:
+
+- The EPP will expose a flag to define the header key (default: `x-gateway-inference-objectives`) that will be used to assign InferenceObjectives to requests
+- The EPP will expose a flag to define the header key (default: `x-gateway-inference-usage-meter`) that will be used in tracking Request Usage (which will act as the identifier for simple fairness implementation)


usage is a loaded term, I recommend x-gateway-inference-fairness-id?

was thinking exactly the same. also for the CRD name which can be something with InferenceFairness

Sure! Done.

The CRD name is in phase 2 and marked as a placeholder, so ill just leave it for the time being i think. We can have a Ultimate Naming Championship post-GA/in Phase 2.

ahg-g · 2025-07-25T21:13:08Z

/lgtm

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 21, 2025

k8s-ci-robot requested a review from ahg-g July 21, 2025 21:33

k8s-ci-robot requested a review from robscott July 21, 2025 21:33

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 21, 2025

kfswain force-pushed the multi-tenant-api-proposal branch from 42b20be to 2f50c00 Compare July 21, 2025 21:38

ahg-g reviewed Jul 21, 2025

View reviewed changes

kfswain force-pushed the multi-tenant-api-proposal branch from 2f50c00 to 9c8935e Compare July 22, 2025 21:15

elevran reviewed Jul 23, 2025

View reviewed changes

elevran mentioned this pull request Jul 24, 2025

REQUEST: New membership for elevran kubernetes/org#5729

Closed

11 tasks

kfswain force-pushed the multi-tenant-api-proposal branch 2 times, most recently from 0ff5988 to 5f5decd Compare July 24, 2025 16:30

kfswain force-pushed the multi-tenant-api-proposal branch 2 times, most recently from d236230 to ac5f61f Compare July 25, 2025 16:50

ahg-g reviewed Jul 25, 2025

View reviewed changes

kfswain force-pushed the multi-tenant-api-proposal branch from ac5f61f to 896c729 Compare July 25, 2025 18:42

ahg-g reviewed Jul 25, 2025

View reviewed changes

Initial commit

c9573fc

kfswain force-pushed the multi-tenant-api-proposal branch from 896c729 to c9573fc Compare July 25, 2025 19:13

k8s-ci-robot assigned ahg-g Jul 25, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2025

k8s-ci-robot merged commit 63d6b67 into kubernetes-sigs:main Jul 25, 2025
8 checks passed

robertgshaw2-redhat mentioned this pull request Jul 31, 2025

[Roadmap] llm-d 0.3 Release Plan llm-d/llm-d#146

Open

32 tasks

kfswain added a commit to kfswain/llm-instance-gateway that referenced this pull request Jul 31, 2025

Initial commit (kubernetes-sigs#1199)

252492d


		### Pool Binding

		Tackling the simplest first; the single Pool Binding field `PoolRef` is simply to associate a given InferenceObjectives object with a pool. Meaning that InferenceObjectives with duplicate ID's across different pools are considered valid.


		This ID can be any string; in the case of a MaaS platform, it may identify a customer via a one-way hashed JWT. Or for sharing an InferencePool between teams, it may be a team-id.

		*Important*: In order to support a high volume of tenants, by default IGW will _allow_ unique IDs that do not have an explicit InferenceObjectives object defined. Instead the default for all policies will be used.


		As such, the InferenceModel API will be split into separate CRDs to reflect the difference in these scopes. Phase 1 will focus on the Request specific objectives. Specifically it will maintain the inclusion of criticality. Other phase 1 changes:

		- The EPP will expose a flag to define the header key that will be used to assign InferenceObjectives to

Proposing the successor to the InferenceModel API #1199

Proposing the successor to the InferenceModel API #1199

Uh oh!

Conversation

kfswain commented Jul 21, 2025

Uh oh!

netlify bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elevran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfswain Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfswain commented Jul 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahg-g commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

netlify bot commented Jul 21, 2025 •

edited

Loading

kfswain Jul 23, 2025 •

edited

Loading