Skip to content

Conversation

kfswain
Copy link
Collaborator

@kfswain kfswain commented Jul 21, 2025

Copy link

netlify bot commented Jul 21, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit c9573fc
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6883d76579dadd0008df01c1
😎 Deploy Preview https://deploy-preview-1199--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 21, 2025
@k8s-ci-robot k8s-ci-robot requested a review from ahg-g July 21, 2025 21:33
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from robscott July 21, 2025 21:33
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 21, 2025
@kfswain kfswain force-pushed the multi-tenant-api-proposal branch from 42b20be to 2f50c00 Compare July 21, 2025 21:38
The only field associated with identification is the `ID` field. A unique ID field was chosen (rather than using the metadata name), because:
- We do not want to put the same restrictions on the string that is enfored on a kube resource name
- The ID name may be duplicated across different pools
- Use of a kube-generated name would force an upstream Auth mechanism to be aware of the `InferenceObjectives` API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a kube-generated name, it is a name chosen by the admin when creating the object, so I am not sure this is an issue, in fact using the object name has the clear advantage of enforcing uniqueness.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might not be an advantage, if someone wants to use the same auth 'identity' string in multiple pools in the same namespace, they wouldn't be able to do so with the current implementation. Just depends on usage, but we may not want to inject this opinion on the users behalf

@kfswain kfswain force-pushed the multi-tenant-api-proposal branch from 2f50c00 to 9c8935e Compare July 22, 2025 21:15
Copy link
Contributor

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good and an improvement over InferenceModel.
Left some comments and requests for clarification where needed.

- Anonymous identity/defaults are graceful (fault-tolerant) & unsurprising
- Scalable, simple, and reusable config
- Retain the functionality of InferenceModel
- Traffic splitting models & modelName rewrite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on discussion in community meeting, my understanding was the traffic splitting is not in scope of the inference scheduler and should be delegated to GW API, but perhaps I misunderstood. L115 seems to be in support of pushing this out of GIE, but perhaps the above is meant to indicate that traffic splitting is still in scope though not necessarily via the new CRD?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retain functionality in the sense that: it will still be possible we arent just dropping the feature entireley, but as you mention that functionality will be achieved through alternate means.

Open to different wording here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the concrete proposal of how this will be done, @kfswain perhaps you can add a link to it or copy from it, as you see fit: https://docs.google.com/document/d/1s4U4T_cjQkk4UeIDyAJl2Ox6FZoBigXBXn9Ai0qV7As/edit?tab=t.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to different wording here.

- Continue to support traffic splitting across models, although not necessarily via GIE CRDs directly (e.g., delegated to GW API).

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

```

### Other changes
- The EPP will expose a flag to define the header key that will be used to assign InferenceObjectives to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: is the intent for this be added to the config file or command line flag?
Also, does the sentence cut short? ... to assign InferenceObjectives to seems to be missing a targer.

//
// +optional
// +kubebuilder:default="inference.networking.x-k8s.io"
Group Group `json:"group,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: are Group and Kind needed?
Aren't they fixed since always referring to an InferencePool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that including everything in the spec is a good idea, because the defaults allow G&K to be treated as optional, but if someone were to ever extend this and want a different G&K it would hopefully be slightly easier to use. But in practice, yeah I agree. It just seems to be standard to include the whole thing, from what I've seen.

## Phase 2

### CRD spec
```golang
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation across structs seems inconsistent


type InferenceObjectivesSpec struct {
// Scope: Identifier
ID string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification (also based on L201): this is the ID value set in request header which indicates the request is part of the category (tenant/flow/identity) to which the objective applies?


### Pool Binding

Tackling the simplest first; the single Pool Binding field `PoolRef` is simply to associate a given InferenceObjectives object with a pool. Meaning that InferenceObjectives with duplicate ID's across different pools are considered valid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the objectives would be duplicated with the exception of the pool, correct?
Should we consider making the pool reference an array to avoid the repetition and ensure that policies which are meant to be identical across multiple pools in the namespace do not diverge?


This ID can be any string; in the case of a MaaS platform, it may identify a customer via a one-way hashed JWT. Or for sharing an InferencePool between teams, it may be a team-id.

***Important***: In order to support a high volume of tenants, by default IGW will _allow_ unique IDs that do not have an explicit InferenceObjectives object defined. Instead the default for all policies will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear on first read. What is the "default for all policies" that will be in use?
For requests having an ID which is not matched, where/how is the default policy defined?

Also, do we differentiate between no identity (e.g., missing header, perhaps implying "anonymous"?) and a request with identity that does not have a corresponding matching ID in any policy (e.g., apply default pool policies to it)?

***Important***: In order to support a high volume of tenants, by default IGW will _allow_ unique IDs that do not have an explicit InferenceObjectives object defined. Instead the default for all policies will be used.

#### Alternative consideration(s)
- Expanding the PoolRef field to be plural was considered, however that was not selected to maintain simplicity. It is a decision that can be revisited in the future, however.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier comment. Consider a use case for policies that span multiple pools and should remain consistent.
What the downside of plural PoolRef?

Copy link
Collaborator Author

@kfswain kfswain Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT multi-pool reference, I remember there being issues with it, but I don't remember the exact reasoning.

I'm happy to hash it out in more detail in the community meeting tomorrow, I think the issues may have been Gateway related.

@kfswain kfswain force-pushed the multi-tenant-api-proposal branch 2 times, most recently from 0ff5988 to 5f5decd Compare July 24, 2025 16:30
@kfswain
Copy link
Collaborator Author

kfswain commented Jul 24, 2025

Note: made some substantial changes to this proposal in the latest push, it may not read well, will do another proof read myself in a bit.

@kfswain kfswain force-pushed the multi-tenant-api-proposal branch 2 times, most recently from d236230 to ac5f61f Compare July 25, 2025 16:50

As such, the InferenceModel API will be split into separate CRDs to reflect the difference in these scopes. Phase 1 will focus on the **Request specific objectives**. Specifically it will maintain the inclusion of criticality. Other phase 1 changes:

- The EPP will expose a flag to define the header key that will be used to assign InferenceObjectives to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we define the header key for both selecting the inferenceObjective and the flow id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also the header for the model name re-write

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, done.

@kfswain kfswain force-pushed the multi-tenant-api-proposal branch from ac5f61f to 896c729 Compare July 25, 2025 18:42
As such, the InferenceModel API will be split into separate CRDs to reflect the difference in these scopes. Phase 1 will focus on the **Request specific objectives**. Specifically it will maintain the inclusion of criticality. Other phase 1 changes:

- The EPP will expose a flag to define the header key (default: `x-gateway-inference-objectives`) that will be used to assign InferenceObjectives to requests
- The EPP will expose a flag to define the header key (default: `x-gateway-inference-usage-meter`) that will be used in tracking Request Usage (which will act as the identifier for simple fairness implementation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage is a loaded term, I recommend x-gateway-inference-fairness-id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was thinking exactly the same. also for the CRD name which can be something with InferenceFairness

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRD name is in phase 2 and marked as a placeholder, so ill just leave it for the time being i think. We can have a Ultimate Naming Championship post-GA/in Phase 2.

@kfswain kfswain force-pushed the multi-tenant-api-proposal branch from 896c729 to c9573fc Compare July 25, 2025 19:13
@ahg-g
Copy link
Contributor

ahg-g commented Jul 25, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2025
@k8s-ci-robot k8s-ci-robot merged commit 63d6b67 into kubernetes-sigs:main Jul 25, 2025
8 checks passed
kfswain added a commit to kfswain/llm-instance-gateway that referenced this pull request Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants