Skip to content

Commit 9c8935e

Browse files
committed
Initial commit
1 parent 0e1e964 commit 9c8935e

File tree

1 file changed

+250
-0
lines changed
  • docs/proposals/1199-multitenant-api-proposal

1 file changed

+250
-0
lines changed
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# Scheduling Subsystem Architecture
2+
3+
Author(s): @kfswain, @ahg-g, @lukeavandrie
4+
## Proposal Status
5+
***Draft***
6+
7+
## Summary
8+
Multiple docs have discussed the restructuring of the InferenceModel API. This [doc](https://docs.google.com/document/d/1x6aI9pbTF5oOsaEQYc9n4pBBY3_AuEY2X51VKxmBSnU/edit?tab=t.0#heading=h.towq7jyczzgo) proposes an InferenceSchedulingObjective CRD, and this [doc](https://docs.google.com/document/d/1G-CQ17CM4j1vNE3T6u9uP2q-m6jK14ANPCwTfJ2qLS4/edit?tab=t.0) builds upon the previous document to solidify the requirement for the new iteration of the InferenceModel API to continue to solve the identity problem. Both these documents were useful in continuing to gather feedback & iterate on a proper solution.
9+
10+
This proposal is intended to act as the plan of record for solution that will be implemented.
11+
12+
## Implementation Phases
13+
14+
### Phase 1 - Rename & Modify InferenceModel
15+
Due to these facts:
16+
- the Criticality field of InferenceModel is in use, & provides functionality
17+
- InferenceModel is an Alpha API
18+
- InferenceModel is not depended upon by upstream or downstream components
19+
20+
Phase 1 will retain the Criticality functionality, but will rename the InferenceModel API as well as slimming down the spec.
21+
22+
### Phase 2 - Introduce new Policy fields
23+
Phase 2 will happen over a longer period of time & slowly introduce new policies to the API, much of what is discussed in this proposal is keeping Phase 2 in mind.
24+
Primarily phase 2 will introduce these policies:
25+
- Fairness
26+
- SLO
27+
28+
Due to the behavior added & required for these policies to function correctly, naming of the API must necessarily consider this second phase.
29+
30+
## Design Principles
31+
32+
### Goals
33+
- Reliable and predictable fairness allocation
34+
- Disconnect identity from policy-like objects where possible
35+
- Anonymous identity/defaults are graceful (fault-tolerant) & unsurprising
36+
- Scalable, simple, and reusable config
37+
- Retain the functionality of InferenceModel
38+
- Traffic splitting models & modelName rewrite
39+
- Criticality
40+
41+
### Non-Goals
42+
- Addressing security concerns with the API, this is currently expected to either be:
43+
- Entirely contained within a trusted system
44+
- Or auth handled upstream
45+
- IGW implementing a custom auth mechanism
46+
47+
48+
## Definitions
49+
50+
- **Tenant** (synonymous with: *Flow* or *Identity*) - In the context of Inference Gateway these names are synonymous. Kuberenetes chooses the term ***tenant*** as described [here](https://kubernetes.io/docs/concepts/security/multi-tenancy/#tenants).
51+
52+
# Proposal
53+
54+
Discussion of the problem(s) can be seen in the linked documents. Here we will describe the new API surface.
55+
56+
## Phase 1
57+
58+
### Naming
59+
This API solves 3 general pillars of problem, that can also be categorized into 2 areas:
60+
61+
- This API describes Resource Sharing (Criticality/Fairness)
62+
- This API describes Identification (used in Fairness)
63+
- This API describes Specific Request Policy (SLO)
64+
65+
66+
As such, the name of the API should convey these concepts well. The `Inference-` prefix will remain, as that is a succint & accepted term related to generative AI serving.
67+
68+
The accompanying name should also convey some or all of the other pillars, some of the names that have been considered:
69+
70+
- `Tenant`
71+
- Pros:
72+
- Resource Sharing is implicit due to evocation of the 'multitenancy' concept
73+
- Identification is well understood from a term like Tenant
74+
- Cons:
75+
- Due to the prevalance of the 'multitenancy' concept, the term of tenant may clash with a user-defined term of 'tenant' causing a confusing interaction
76+
- Does not convey Specific Request Policy well
77+
78+
- `Flow`
79+
- Pros:
80+
- A common networking term helpful in describing request traffic rate control
81+
- Cons:
82+
- Doesnt well describe Identification or Resource Sharing
83+
84+
- `Objectives` (preferred)
85+
- Pros:
86+
- Specific Request Policy is easily understood
87+
- Resource Sharing is a natural fit
88+
- Cons:
89+
- Identity is not well conveyed
90+
91+
92+
### CRD spec
93+
94+
This CRD definition is a slimmed version of InferenceModel with a name change. Example here:
95+
96+
```golang
97+
type InferenceObjectives struct {
98+
metav1.TypeMeta
99+
metav1.ObjectMeta
100+
101+
Spec InferenceObjectivesSpec
102+
}
103+
104+
type InferenceObjectivesSpec struct {
105+
PoolRef InferenceObjectReference
106+
107+
Criticality *int
108+
}
109+
110+
```
111+
112+
### Other changes
113+
- The EPP will expose a flag to define the header key that will be used to assign InferenceObjectives to
114+
- The modelName rewrite functionality will be included into EPP as a core feature (also handled by header)
115+
- Traffic splitting on model name will be pushed to HTTPRoute using the BBR tool to extract the modelName to the header
116+
117+
## Phase 2
118+
119+
### CRD spec
120+
```golang
121+
122+
type InferenceObjectives struct {
123+
metav1.TypeMeta
124+
metav1.ObjectMeta
125+
126+
Spec InferenceObjectivesSpec
127+
}
128+
129+
type InferenceObjectivesSpec struct {
130+
// Scope: Identifier
131+
ID string
132+
133+
// Scope: Pool binding
134+
PoolRef InferenceObjectReference
135+
136+
137+
// Scope: Policy Handles (the following are different options on how policy binding may look)
138+
139+
// Include core policies as a field of the object
140+
Criticality *int
141+
FairnessPolicy NotYetDefinedFairnessPolicy
142+
SLOPolicy InferenceSLOPolicy
143+
144+
// or
145+
146+
// Create an array fields that can bind arbitrary policies (including the 'core' policies)
147+
Policies []InferenceObjectReference
148+
149+
// or
150+
151+
// one-of: (Where Policy class would contain all 'core' policies that can be configured.)
152+
PolicyProfileRef InferenceObjectReference
153+
PolicyProfile PolicyProfile
154+
}
155+
156+
// PoolObjectReference identifies an API object within the namespace of the
157+
// referrer.
158+
type InferenceObjectReference struct {
159+
// Group is the group of the referent.
160+
//
161+
// +optional
162+
// +kubebuilder:default="inference.networking.x-k8s.io"
163+
Group Group `json:"group,omitempty"`
164+
165+
// Kind is kind of the referent. For example "InferencePool".
166+
//
167+
// +optional
168+
// +kubebuilder:default="InferencePool"
169+
Kind Kind `json:"kind,omitempty"`
170+
171+
// Name is the name of the referent.
172+
//
173+
// +kubebuilder:validation:Required
174+
Name ObjectName `json:"name"`
175+
}
176+
177+
178+
type PolicyProfile struct {
179+
metav1.TypeMeta
180+
metav1.ObjectMeta
181+
182+
Spec PolicyProfileSpec
183+
}
184+
185+
type PolicyProfileSpec struct {
186+
// this is a departure from InferenceModel that used string for criticality.
187+
// We got quite a bit of feedback around allowing for custom criticality bands, so an int/enum is more flexible & carries inherent stack rank value.
188+
Criticality *int
189+
FairnessPolicy NotYetDefinedFairnessPolicy
190+
SLOPolicy InferenceSLOPolicy
191+
}
192+
193+
type InferenceCriticalityPolicy struct {
194+
Criticality int32
195+
}
196+
```
197+
198+
### Intent
199+
200+
The purpose(s) of the `InferenceObjectives` is:
201+
- Create a strong concept of a tenant within the inference pool, used to associate groups of requests together for the purpose of Flow Clontrol - which can enforce:
202+
- Fair resource sharing
203+
- Inter-tenant prioritization
204+
- SLO attainment
205+
- Create a handle with which to attach scheduling policies, allowing for heterogenous scheduling behavior across tenants
206+
- Such as the [InferenceSLOPolicy](https://docs.google.com/document/d/1j2KRAT68_FYxq1iVzG0xVL-DHQhGVUZBqiM22Hd_0hc/edit?resourcekey=0-5cSovS8QcRQNYXj0_kRMiw&tab=t.0#heading=h.emkaixupvf39).
207+
- Detach identification from the modelName field
208+
209+
## Usage
210+
211+
The InferenceObjectives API surface has 3 general scopes:
212+
- Identification
213+
- Pool Binding
214+
- Policy Handle
215+
216+
### Pool Binding
217+
218+
Tackling the simplest first; the single Pool Binding field `PoolRef` is simply to associate a given InferenceObjectives object with a pool. Meaning that InferenceObjectives with duplicate ID's across different pools are considered valid.
219+
220+
### Identification
221+
**Note**: This ID field is proposed for Phase 2, & Phase 1 will use the kube name as the identifier in the short term. To make a smooth transition, the ID field would default to the kube name.
222+
223+
The only field associated with identification is the `ID` field. A unique ID field was chosen (rather than using the metadata name), because:
224+
- We do not want to put the same restrictions on the string that is enfored on a kube resource name
225+
- The ID name may be duplicated across different pools
226+
- Use of a kube-generated name would force an upstream Auth mechanism to be aware of the `InferenceObjectives` API
227+
228+
This ID can be any string; in the case of a MaaS platform, it may identify a customer via a one-way hashed JWT. Or for sharing an InferencePool between teams, it may be a team-id.
229+
230+
***Important***: In order to support a high volume of tenants, by default IGW will _allow_ unique IDs that do not have an explicit InferenceObjectives object defined. Instead the default for all policies will be used.
231+
232+
#### Alternative consideration(s)
233+
- Expanding the PoolRef field to be plural was considered, however that was not selected to maintain simplicity. It is a decision that can be revisited in the future, however.
234+
235+
### Policy Specification
236+
There are a few options with regards to how we handle policy binding. Some options that have been considered:
237+
238+
- An `array` field on the `InferenceObjectives` which would contain `ObjectReferences`, allowing for binding of arbitrary policies
239+
- Pros: highly flexible, allows user defined policy
240+
- Cons: highly abstract & difficult to enforce meaningful defaults/limits on the policies allowed
241+
- A one-of per policy field to either allow embedded configuration of a policy, or a reference to a policy
242+
- Pros: flexibility of selection, reuse of policies
243+
- Cons: reusability is only per policy, composite policy structure is still defined on the `InferenceObjectives`.
244+
- A `PolicyProfile` type object that can configure multiple policies together, and multiple `InferenceObjectives`s can reference
245+
- Pros: Allows complex configuration to be reused
246+
- Cons: Creates another CRD layer, potentially increasing complexity of the API surface
247+
248+
The preferred path forward is a utilization of `PolicyProfile` that can be referenced, or directly configured in the embedded field on the `InferenceObjectives`.
249+
250+
This allows for a tradeoff of clear configuration, or reuse of complex config.

0 commit comments

Comments
 (0)