Skip to content

Commit 2c9abe0

Browse files
committed
add kep
1 parent 1886541 commit 2c9abe0

File tree

1 file changed

+24
-20
lines changed
  • keps/sig-apps/todo-mutable-job-pod-resource-updates

1 file changed

+24
-20
lines changed

keps/sig-apps/todo-mutable-job-pod-resource-updates/README.md

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -159,25 +159,25 @@ to allow suspending jobs to control when the Pods of a Job get created by contro
159159
This was proposed as a primitive to allow a higher-level queue controller to implement
160160
job queuing: the queue controller unsuspends the job when resources become available.
161161

162-
To complement the above capability, a queue controller may also want to control the
163-
resource requirements of a job based on current cluster capacity or resource availability.
162+
To complement the above capability, a queue controller may also want to control the
163+
resource requirements of a job based on current cluster capacity or resource availability.
164164
For example, it may want to adjust CPU, memory, and GPU requests/limits based on available
165-
node capacity, allocate specific extended resources like TPUs or FPGAs, optimize resource
166-
allocation for better cluster utilization, or modify resource requirements based on queue
165+
node capacity, allocate specific extended resources like TPUs or FPGAs, optimize resource
166+
allocation for better cluster utilization, or modify resource requirements based on queue
167167
priority and cluster load.
168168

169-
This is a proposal to relax update validation on suspended jobs to allow mutating
169+
This is a proposal to relax update validation on suspended jobs to allow mutating
170170
resource specifications in the job's pod template, specifically CPU, memory, GPU,
171-
and other extended resource requests and limits. This enables a higher-level queue
172-
controller to optimize resource allocation before un-suspending a job based on
171+
and other extended resource requests and limits. This enables a higher-level queue
172+
controller to optimize resource allocation before un-suspending a job based on
173173
current cluster conditions and resource availability.
174174

175175
## Motivation
176176

177177
Most kubernetes batch workloads have dynamic resource requirements that may not be
178178
known at job creation time. The optimal resource allocation for a job often depends
179179
on current cluster conditions, available capacity, and queue priorities that change
180-
over time. This is especially true for GPU and other specialized hardware resources
180+
over time. This is especially true for GPU and other specialized hardware resources
181181
which are expensive and have limited availability.
182182

183183
We made the first step towards achieving better resource management by introducing the
@@ -194,7 +194,7 @@ appropriately for current capacity constraints.
194194

195195
### Goals
196196

197-
- Allow mutating CPU, memory, GPU, and extended resource requests and limits of the pod template of suspended jobs.
197+
- Allow mutating CPU, memory, GPU, and extended resource requests and limits of a container within a PodTemplate of a suspended jobs.
198198
- Enable queue controllers to optimize resource allocation based on cluster conditions.
199199
- Improve cluster resource utilization through dynamic resource sizing, especially for expensive GPU and specialized hardware.
200200

@@ -206,6 +206,7 @@ appropriately for current capacity constraints.
206206
- Allow mutating resource specifications of pods directly.
207207
- Allow mutating other job specifications beyond container resource requirements.
208208
- Support in-place pod resource updates (this is covered by separate KEPs).
209+
- Allow mutating of Pod Resources.
209210

210211
## Proposal
211212

@@ -225,12 +226,11 @@ forces the jobs to be created in a suspended state. The controller analyzes curr
225226
cluster capacity and adjusts job resource requirements to optimize cluster utilization
226227
before unsuspending them.
227228

228-
At job creation time, users may specify conservative resource estimates or may not know
229-
the optimal resource allocation for current cluster conditions. The queue controller can
230-
analyze available capacity, other queued jobs, and cluster utilization patterns to
231-
determine optimal CPU, memory, and GPU allocations. For example, it might assign a job
232-
to use V100 GPUs when A100s are unavailable, or adjust the number of GPUs based on current
233-
availability. By updating the job's resource requirements before unsuspending it, the
229+
At job creation time, users may specify conservative resource estimates or may not know
230+
the optimal resource allocation for current cluster conditions. The queue controller can
231+
analyze available capacity, other queued jobs, and cluster utilization patterns to
232+
determine optimal CPU, memory, and GPU allocations. For example, it might adjust the number of GPUs based on current
233+
availability. By updating the job's resource requirements before unsuspending it, the
234234
controller ensures efficient resource utilization and better cluster throughput.
235235

236236
### Risks and Mitigations
@@ -249,14 +249,14 @@ controller ensures efficient resource utilization and better cluster throughput.
249249
## Design Details
250250

251251
The pod template validation logic in the API server needs to be updated to relax the validation
252-
of the Job's Template field. Currently the template is immutable, but we need to make
252+
of the Job's Template field. Currently the template is immutable, but we need to make
253253
container resource specifications (CPU, memory, GPU, and extended resources requests and limits) mutable for suspended jobs.
254254

255255
The condition we will check to verify that the job is suspended is `Job.Spec.Suspend=true`.
256256

257257
We will allow updates to the following fields in container specifications within the pod template:
258258
- `resources.requests.cpu`
259-
- `resources.requests.memory`
259+
- `resources.requests.memory`
260260
- `resources.requests.*` (for extended resources like `nvidia.com/gpu`, `amd.com/gpu`, `tpu-v4` etc.)
261261
- `resources.limits.cpu`
262262
- `resources.limits.memory`
@@ -276,7 +276,11 @@ We will allow updates to the following fields in container specifications within
276276

277277
#### Integration tests
278278

279-
Available under [Job integrations tests](https://github.com/kubernetes/kubernetes/blob/457341c3d408097025af5a9b6f5917439c0debdd/test/integration/job/job_test.go#L1397)
279+
We will the following test scenarios to kubernetes/test/integration/jobs.
280+
281+
a. When a job is suspended with feature gate enabled, resources are able to be mutated.
282+
b. When a job is not suspended and feature gate enabled, resources should not be mutated.
283+
c. When feature date is disabled and suspended, mutations are not allowed.
280284

281285
#### e2e tests
282286

@@ -383,7 +387,7 @@ N/A. This feature doesn't impact nodes.
383387
###### Does enabling the feature change any default behavior?
384388

385389
Yes, it relaxes validation of updates to jobs. Specifically, it will allow
386-
mutating the container resource specifications (CPU, memory, GPU, and extended resource
390+
mutating the container resource specifications (CPU, memory, GPU, and extended resource
387391
requests and limits) in the pod template of suspended jobs.
388392

389393
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
@@ -469,7 +473,7 @@ No.
469473

470474
The feature itself doesn't generate API calls. But it will allow the
471475
apiserver to accept update requests to mutate container resource specifications
472-
(CPU, memory, GPU, and extended resources) in job pod templates, which will
476+
(CPU, memory, GPU, and extended resources) in job pod templates, which will
473477
encourage implementing controllers that do this.
474478

475479
###### Will enabling / using this feature result in introducing new API types?

0 commit comments

Comments
 (0)