-
Notifications
You must be signed in to change notification settings - Fork 179
Proposing the successor to the InferenceModel API #1199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposing the successor to the InferenceModel API #1199
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
42b20be
to
2f50c00
Compare
The only field associated with identification is the `ID` field. A unique ID field was chosen (rather than using the metadata name), because: | ||
- We do not want to put the same restrictions on the string that is enfored on a kube resource name | ||
- The ID name may be duplicated across different pools | ||
- Use of a kube-generated name would force an upstream Auth mechanism to be aware of the `InferenceObjectives` API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a kube-generated name, it is a name chosen by the admin when creating the object, so I am not sure this is an issue, in fact using the object name has the clear advantage of enforcing uniqueness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might not be an advantage, if someone wants to use the same auth 'identity' string in multiple pools in the same namespace, they wouldn't be able to do so with the current implementation. Just depends on usage, but we may not want to inject this opinion on the users behalf
2f50c00
to
9c8935e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good and an improvement over InferenceModel.
Left some comments and requests for clarification where needed.
- Anonymous identity/defaults are graceful (fault-tolerant) & unsurprising | ||
- Scalable, simple, and reusable config | ||
- Retain the functionality of InferenceModel | ||
- Traffic splitting models & modelName rewrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on discussion in community meeting, my understanding was the traffic splitting is not in scope of the inference scheduler and should be delegated to GW API, but perhaps I misunderstood. L115 seems to be in support of pushing this out of GIE, but perhaps the above is meant to indicate that traffic splitting is still in scope though not necessarily via the new CRD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retain functionality in the sense that: it will still be possible we arent just dropping the feature entireley, but as you mention that functionality will be achieved through alternate means.
Open to different wording here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the concrete proposal of how this will be done, @kfswain perhaps you can add a link to it or copy from it, as you see fit: https://docs.google.com/document/d/1s4U4T_cjQkk4UeIDyAJl2Ox6FZoBigXBXn9Ai0qV7As/edit?tab=t.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to different wording here.
- Continue to support traffic splitting across models, although not necessarily via GIE CRDs directly (e.g., delegated to GW API).
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
``` | ||
|
||
### Other changes | ||
- The EPP will expose a flag to define the header key that will be used to assign InferenceObjectives to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: is the intent for this be added to the config file or command line flag?
Also, does the sentence cut short? ... to assign InferenceObjectives to
seems to be missing a targer.
// | ||
// +optional | ||
// +kubebuilder:default="inference.networking.x-k8s.io" | ||
Group Group `json:"group,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: are Group and Kind needed?
Aren't they fixed since always referring to an InferencePool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that including everything in the spec is a good idea, because the defaults allow G&K to be treated as optional, but if someone were to ever extend this and want a different G&K it would hopefully be slightly easier to use. But in practice, yeah I agree. It just seems to be standard to include the whole thing, from what I've seen.
## Phase 2 | ||
|
||
### CRD spec | ||
```golang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation across structs seems inconsistent
|
||
type InferenceObjectivesSpec struct { | ||
// Scope: Identifier | ||
ID string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification (also based on L201): this is the ID value set in request header which indicates the request is part of the category (tenant/flow/identity) to which the objective applies?
|
||
### Pool Binding | ||
|
||
Tackling the simplest first; the single Pool Binding field `PoolRef` is simply to associate a given InferenceObjectives object with a pool. Meaning that InferenceObjectives with duplicate ID's across different pools are considered valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the objectives would be duplicated with the exception of the pool, correct?
Should we consider making the pool reference an array to avoid the repetition and ensure that policies which are meant to be identical across multiple pools in the namespace do not diverge?
|
||
This ID can be any string; in the case of a MaaS platform, it may identify a customer via a one-way hashed JWT. Or for sharing an InferencePool between teams, it may be a team-id. | ||
|
||
***Important***: In order to support a high volume of tenants, by default IGW will _allow_ unique IDs that do not have an explicit InferenceObjectives object defined. Instead the default for all policies will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear on first read. What is the "default for all policies" that will be in use?
For requests having an ID which is not matched, where/how is the default policy defined?
Also, do we differentiate between no identity (e.g., missing header, perhaps implying "anonymous"?) and a request with identity that does not have a corresponding matching ID in any policy (e.g., apply default pool policies to it)?
***Important***: In order to support a high volume of tenants, by default IGW will _allow_ unique IDs that do not have an explicit InferenceObjectives object defined. Instead the default for all policies will be used. | ||
|
||
#### Alternative consideration(s) | ||
- Expanding the PoolRef field to be plural was considered, however that was not selected to maintain simplicity. It is a decision that can be revisited in the future, however. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See earlier comment. Consider a use case for policies that span multiple pools and should remain consistent.
What the downside of plural PoolRef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRT multi-pool reference, I remember there being issues with it, but I don't remember the exact reasoning.
I'm happy to hash it out in more detail in the community meeting tomorrow, I think the issues may have been Gateway related.
0ff5988
to
5f5decd
Compare
Note: made some substantial changes to this proposal in the latest push, it may not read well, will do another proof read myself in a bit. |
d236230
to
ac5f61f
Compare
|
||
As such, the InferenceModel API will be split into separate CRDs to reflect the difference in these scopes. Phase 1 will focus on the **Request specific objectives**. Specifically it will maintain the inclusion of criticality. Other phase 1 changes: | ||
|
||
- The EPP will expose a flag to define the header key that will be used to assign InferenceObjectives to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we define the header key for both selecting the inferenceObjective and the flow id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also the header for the model name re-write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, done.
ac5f61f
to
896c729
Compare
As such, the InferenceModel API will be split into separate CRDs to reflect the difference in these scopes. Phase 1 will focus on the **Request specific objectives**. Specifically it will maintain the inclusion of criticality. Other phase 1 changes: | ||
|
||
- The EPP will expose a flag to define the header key (default: `x-gateway-inference-objectives`) that will be used to assign InferenceObjectives to requests | ||
- The EPP will expose a flag to define the header key (default: `x-gateway-inference-usage-meter`) that will be used in tracking Request Usage (which will act as the identifier for simple fairness implementation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usage is a loaded term, I recommend x-gateway-inference-fairness-id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was thinking exactly the same. also for the CRD name which can be something with InferenceFairness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CRD name is in phase 2 and marked as a placeholder, so ill just leave it for the time being i think. We can have a Ultimate Naming Championship post-GA/in Phase 2.
896c729
to
c9573fc
Compare
/lgtm |
Please consider : https://docs.google.com/document/d/1x6aI9pbTF5oOsaEQYc9n4pBBY3_AuEY2X51VKxmBSnU/edit?tab=t.0#heading=h.towq7jyczzgo & https://docs.google.com/document/d/1G-CQ17CM4j1vNE3T6u9uP2q-m6jK14ANPCwTfJ2qLS4/edit?tab=t.0 required prior reading.