You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -128,14 +128,15 @@ The following are non-goals for this KEP but will probably soon appear to be goa
128
128
129
129
## Proposal
130
130
131
-
The `spec.workload` field will be added to the Pod resource. A sample pod with this new field looks like this:
131
+
The `spec.workloadRef` field will be added to the Pod resource. A sample pod with this new field looks like this:
132
132
```yaml
133
133
apiVersion: v1
134
134
kind: Pod
135
135
spec:
136
136
...
137
137
workloadRef:
138
138
name: job-1
139
+
podGroup: pg1
139
140
...
140
141
```
141
142
@@ -224,35 +225,64 @@ usecases. You can read more about it in the [extended proposal] document.
224
225
225
226
* `Workload` is the resource Kind.
226
227
* `scheduling.k8s.io` is the ApiGroup.
227
-
* `spec.workload` is the name of the new field in pod.
228
+
* `spec.workloadRef` is the name of the new field in pod.
228
229
* Within a Workload there is a list of groups of pods. Each group represents a top-level division of pods within a Workload. Each group can be independently gang scheduled (or not use gang scheduling). This group is named `PodGroup`.
229
230
* In a future , we expect that this group can optionally specify further subdivision into sub groups. Each sub-group can have an index. The indexes go from 0 to N, without repeats or gaps. These subgroups are called `PodSubGroup`.
230
231
* In subsequent KEPs, we expect that a sub-group can optionally specify further subdivision into pod equivalence classes. All pods in a pod equivalence class have the same values for all fields that affect scheduling feasibility. These pod equivalence classes are called `PodSet`.
231
232
232
233
### Associating Pod into PodGroups
233
234
234
-
When a `Workload` consists of a single group of pods needing Gang Scheduling, it is clear which pods belong to the group from the `spec.workload.name` field of the pod. However `Workload` supports listing multiple list items, and a list item can represent a single group, or a set of identical replica groups.
235
+
When a `Workload` consists of a single group of pods needing Gang Scheduling, it is clear which pods belong to the group from the `spec.workloadRef.name` field of the pod. However `Workload` supports listing multiple list items, and a list item can represent a single group, or a set of identical replica groups.
235
236
In these cases, there needs to be additional information to indicate which group a pod belongs to.
236
237
237
-
We proposed to extend the newly introduced `pod.spec.workload` field with additional information
238
-
to include that information. More specifically, the `pod.spec.workload` field is of type `WorkloadReference`
238
+
We proposed to extend the newly introduced `pod.spec.workloadRef` field with additional information
239
+
to include that information. More specifically, the `pod.spec.workloadRef` field is of type `WorkloadReference`
239
240
and is defined as following:
240
241
241
242
```go
243
+
type PodSpec struct {
244
+
...
245
+
// WorkloadRef provides a reference to the Workload object that this Pod belongs to.
246
+
// This field is used by the scheduler to identify the PodGroup and apply the
247
+
// correct group scheduling policies. The Workload object referenced
248
+
// by this field may not exist at the time the Pod is created.
249
+
// This field is immutable, but a Workload object with the same name
250
+
// may be recreated with different policies. Doing this during pod scheduling
251
+
// may result in the placement not conforming to the expected policies.
252
+
//
253
+
// +featureGate=GenericWorkload
254
+
// +optional
255
+
WorkloadRef *WorkloadReference
256
+
}
257
+
242
258
// WorkloadReference identifies the Workload object and PodGroup membership
243
-
// that a Pod belongs to. The scheduler uses this information to enforce
244
-
// gang scheduling semantics.
259
+
// that a Pod belongs to. The scheduler uses this information to apply
260
+
// workload-aware scheduling semantics.
245
261
type WorkloadReference struct {
246
-
// Name defines the name of the Workload object this pod belongs to.
247
-
Name string
248
-
249
-
// PodGroup defines the name of the PodGroup within a Workload this pod belongs to.
250
-
PodGroup string
251
-
// PodGroupReplicaIndex is the replica index of the PodGroup that this pod
252
-
// belong to when the workload is running ReplicatedGangMode. In this mode,
253
-
// a workload may create multiple identical PodGroups.
254
-
// For workload in a different mode, this field is unset.
255
-
PodGroupReplicaIndex string
262
+
// Name defines the name of the Workload object this Pod belongs to.
263
+
// Workload must be in the same namespace as the Pod.
264
+
// If it doesn't match any existing Workload, the Pod will remain unschedulable
265
+
// until a Workload object is created and observed by the kube-scheduler.
266
+
// It must be a DNS subdomain.
267
+
//
268
+
// +required
269
+
Name string
270
+
271
+
// PodGroup is the name of the PodGroup within the Workload that this Pod
272
+
// belongs to. If it doesn't match any existing PodGroup within the Workload,
273
+
// the Pod will remain unschedulable until the Workload object is recreated
274
+
// and observed by the kube-scheduler. It must be a DNS label.
275
+
//
276
+
// +required
277
+
PodGroup string
278
+
279
+
// PodGroupReplicaKey specifies the replica key of the PodGroup to which this
280
+
// Pod belongs. It is used to distinguish pods belonging to different replicas
281
+
// of the same pod group. The pod group policy is applied separately to each replica.
282
+
// When set, it must be a DNS label.
283
+
//
284
+
// +optional
285
+
PodGroupReplicaKey string
256
286
}
257
287
```
258
288
@@ -273,7 +303,6 @@ metadata:
273
303
spec:
274
304
podGroups:
275
305
- name: "job-1"
276
-
replicas: 4
277
306
policy:
278
307
gang:
279
308
minCount: 100
@@ -291,7 +320,6 @@ spec:
291
320
podGroup: job-1
292
321
podGroupReplicaKey: key-2
293
322
...
294
-
295
323
```
296
324
297
325
We decided for this option because it is more succinct and makes the role of a pod clear just
@@ -312,77 +340,114 @@ to identify pods belonging to it. However, with this pattern:
312
340
313
341
The `Workload` type will be defined with the following structure:
314
342
```go
343
+
// Workload allows for expressing scheduling constraints that should be used
344
+
// when managing lifecycle of workloads from scheduling perspective,
345
+
// including scheduling, preemption, eviction and other phases.
315
346
type Workload struct {
316
347
metav1.TypeMeta
348
+
// Standard object's metadata.
349
+
// Name must be a DNS subdomain.
350
+
//
351
+
// +optional
317
352
metav1.ObjectMeta
353
+
354
+
// Spec defines the desired behavior of a Workload.
355
+
//
356
+
// +required
318
357
Spec WorkloadSpec
319
-
Status WorkloadStatus
320
358
}
321
359
322
-
// WorkloadSpec describes a workload in a portable way that scheduler and related
323
-
// tools can understand.
360
+
// WorkloadMaxPodGroups is the maximum number of pod groups per Workload.
361
+
const WorkloadMaxPodGroups = 8
362
+
363
+
// WorkloadSpec defines the desired state of a Workload.
324
364
type WorkloadSpec struct {
325
-
// ControllerRef points to the true workload, e.g. Deployment.
326
-
// It is optional to set and is intended to make this mapping easier for
327
-
// things like CLI tools.
328
-
// This field is immutable.
329
-
ControllerRef *v1.ObjectReference
330
-
331
-
// PodGroups is a list of groups of pods.
332
-
// Each group may request gang scheduling.
333
-
PodGroups []PodGroup
365
+
// ControllerRef is an optional reference to the controlling object, such as a
366
+
// Deployment or Job. This field is intended for use by tools like CLIs
367
+
// to provide a link back to the original workload definition.
368
+
// When set, it cannot be changed.
369
+
//
370
+
// +optional
371
+
ControllerRef *TypedLocalObjectReference
372
+
373
+
// PodGroups is the list of pod groups that make up the Workload.
374
+
// The maximum number of pod groups is 8. This field is immutable.
375
+
//
376
+
// +required
377
+
// +listType=map
378
+
// +listMapKey=name
379
+
PodGroups []PodGroup
334
380
}
335
381
336
-
// PodGroup is a group of pods that may contain multiple shapes (PodSets) and may contain
337
-
// multiple dense indexes (PodSubGroups) and which can optionally be replicated in a variable
338
-
// number of identical copies.
339
-
type PodGroup struct {
340
-
Name *string
341
-
342
-
// Number of identical instances of PodGroup that are part of the Workload.
343
-
// Defaults to 1.
344
-
Replicas int
382
+
// TypedLocalObjectReference allows to reference typed object inside the same namespace.
383
+
type TypedLocalObjectReference struct {
384
+
// APIGroup is the group for the resource being referenced.
385
+
// If APIGroup is empty, the specified Kind must be in the core API group.
386
+
// For any other third-party types, setting APIGroup is required.
387
+
// It must be a DNS subdomain.
388
+
//
389
+
// +optional
390
+
APIGroup string
391
+
// Kind is the type of resource being referenced.
392
+
// It must be a path segment name.
393
+
//
394
+
// +required
395
+
Kind string
396
+
// Name is the name of resource being referenced.
397
+
// It must be a path segment name.
398
+
//
399
+
// +required
400
+
Name string
401
+
}
345
402
346
-
// Policy defines the configuration of the PodGroup to enable different
347
-
// scheduling policies.
348
-
Policy PodGroupPolicy
403
+
// PodGroup represents a set of pods with a common scheduling policy.
404
+
type PodGroup struct {
405
+
// Name is a unique identifier for the PodGroup within the Workload.
406
+
// It must be a DNS label. This field is immutable.
407
+
//
408
+
// +required
409
+
Name string
410
+
411
+
// Policy defines the scheduling policy for this PodGroup.
412
+
//
413
+
// +required
414
+
Policy PodGroupPolicy
349
415
}
350
416
351
-
// PodGroupPolicy defines scheduling configuration of a PodGroup.
417
+
// PodGroupPolicy defines the scheduling configuration for a PodGroup.
352
418
type PodGroupPolicy struct {
353
-
// Kind indicates which of the other fields is non-empty.
0 commit comments