Skip to content

Commit f21031e

Browse files
committed
addresss feedback
Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
1 parent aaab481 commit f21031e

File tree

1 file changed

+33
-11
lines changed
  • keps/sig-scheduling/5471-enable-sla-based-scheduling

1 file changed

+33
-11
lines changed

keps/sig-scheduling/5471-enable-sla-based-scheduling/README.md

Lines changed: 33 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
- [Scheduler Performance Regression](#scheduler-performance-regression)
2121
- [API Compatibility and Version Skew](#api-compatibility-and-version-skew)
2222
- [Edge Cases in Numeric Parsing](#edge-cases-in-numeric-parsing)
23+
- [Taint Misconfiguration Detection](#taint-misconfiguration-detection)
2324
- [Cross-SIG Impact](#cross-sig-impact)
2425
- [Design Details](#design-details)
2526
- [API Changes](#api-changes)
@@ -159,6 +160,22 @@ spec:
159160
value: "750"
160161
effect: NoSchedule
161162
---
163+
apiVersion: v1
164+
kind: Pod
165+
metadata:
166+
name: flexible-sla-workload
167+
spec:
168+
tolerations:
169+
# Accept nodes with SLA >= 900 (SLA = 900 OR SLA > 900)
170+
- key: node.kubernetes.io/sla
171+
operator: Equal
172+
value: "900"
173+
effect: NoSchedule
174+
- key: node.kubernetes.io/sla
175+
operator: Gt
176+
value: "900"
177+
effect: NoSchedule
178+
---
162179
# Critical workload will not be scheduled until a suitable high reliability node has capacity
163180
apiVersion: v1
164181
kind: Pod
@@ -396,6 +413,8 @@ spec:
396413

397414
- Invalid taints meant to be used with the new comparison operators (e.g., `node.kubernetes.io/sla=95.5` and `node.kubernetes.io/version=1`) are not detected at admission time.
398415

416+
- **Taint Misconfiguration Risk**: When nodes have taints with non-numeric values (e.g., `node.kubernetes.io/sla=high` instead of `node.kubernetes.io/sla=950`) that are intended for use with numeric operators, the misconfiguration is only detected during pod scheduling attempts, not at taint creation time. This can lead to scheduling failures that are difficult to diagnose.
417+
399418
### Risks and Mitigations
400419

401420
#### Scheduler Performance Regression
@@ -431,6 +450,16 @@ spec:
431450
- API validation rejects pods with unparseable values rather than silently failing
432451
- Clear error messages help users identify and fix configuration issues
433452

453+
#### Taint Misconfiguration Detection
454+
455+
**Risk**: Node taints intended for numeric comparison may contain non-numeric values (e.g., `node.kubernetes.io/sla=high` instead of `node.kubernetes.io/sla=950`), causing scheduling failures that are only detected during pod placement attempts rather than at taint creation time.
456+
457+
**Mitigation**:
458+
459+
- Clear documentation and examples showing proper numeric taint configuration
460+
- Enhanced error messages in scheduling events that clearly indicate parsing failures
461+
- Monitoring and alerting on scheduling failures due to taint parsing errors
462+
434463
#### Cross-SIG Impact
435464

436465
- SIG-Node
@@ -560,8 +589,6 @@ func compareValues(tolerationVal, taintVal string, op TolerationOperator) (bool,
560589
return tVal < nVal, nil
561590
case TolerationOpGt:
562591
return tVal > nVal, nil
563-
default:
564-
return false, errors.New("toleration and taints values are equal")
565592
}
566593
}
567594
```
@@ -580,11 +607,10 @@ N/A
580607

581608
All core changes must be covered by unit tests, in both Taint API, validation, and scheduler sides:
582609

583-
- **API Validation Tests:** (staging/src/k8s.io/api/core/v1/toleration_test.go)
584-
- **Scheduler Helper Tests:** (staging/src/k8s.io/component-helpers/scheduling/corev1/helpers_test.go)
585-
- **Validation Tests:** ( pkg/apis/core/validation/validation_test.go)
586-
- **ToleratesTaint plugin:** (pkg/scheduler/framework/plugins/tainttoleration/taint_toleration_test.go)
587-
- `<package>`: `<date>` - `<test coverage>`
610+
- `staging/src/k8s.io/api/core/v1/toleration_test.go`: Sep-16-2025 - 66.7%
611+
- `staging/src/k8s.io/component-helpers/scheduling/corev1/helpers_test.go`: Sep-16-2025 - 100%
612+
- `pkg/apis/core/validation/validation_test.go`: Sep-16-2025 - 85.1%
613+
- `pkg/scheduler/framework/plugins/tainttoleration/taint_toleration_test.go`: Sep-16-2025 - 86.9%
588614

589615
##### Performance tests
590616

@@ -599,17 +625,13 @@ The following scenarios need to be covered in integration tests:
599625
- Feature gate's enabling/disabling
600626
- **Scheduler Integration Tests:** will be extended to cover the new taints cases introduced in this KEP:(test/integration/scheduler)
601627

602-
- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/integration/...): [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature)
603-
604628
##### e2e tests
605629

606630
The existing e2e tests will be extended to cover the new taints cases introduced in this KEP:
607631

608632
- **Node Taints e2e Tests:** (test/e2e/node/taints.go)
609633
- **Scheduler Taints e2e Tests:** (test/e2e/scheduling)
610634

611-
- [test name]()
612-
613635
### Graduation Criteria
614636

615637
#### Alpha

0 commit comments

Comments
 (0)