Skip to content

Commit e120fb9

Browse files
committed
resolving KEP comment - flesh out what to do if SLO not being met section
1 parent 5ac0ade commit e120fb9

File tree

1 file changed

+28
-3
lines changed
  • keps/sig-api-machinery/5073-declarative-validation-with-validation-gen

1 file changed

+28
-3
lines changed

keps/sig-api-machinery/5073-declarative-validation-with-validation-gen/README.md

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
- [Risk: we get hundreds of PRs from people migrating fields and can't review them all.](#risk-we-get-hundreds-of-prs-from-people-migrating-fields-and-cant-review-them-all)
3232
- [Mitigation: These are not urgent and we will have patterns which can be reviewed by more people.](#mitigation-these-are-not-urgent-and-we-will-have-patterns-which-can-be-reviewed-by-more-people)
3333
- [Risk: Versioned validation drifts between versions.](#risk-versioned-validation-drifts-between-versions)
34-
- [Mitigation: round-trip testing + fuzzing + equivalence tests.](#mitigation-round-trip-testing--fuzzing--equivalence-tests)
34+
- [Mitigation: round-trip testing + fuzzing + equivalence tests + linting.](#mitigation-round-trip-testing--fuzzing--equivalence-tests--linting)
3535
- [Risk: Migration to Declarative Validation introduces breaking change to API validation](#risk-migration-to-declarative-validation-introduces-breaking-change-to-api-validation)
3636
- [Mitigation: Ensure Invalid Objects Still Invalid](#mitigation-ensure-invalid-objects-still-invalid)
3737
- [Mitigation: Ensure Valid Old Objects Still Valid](#mitigation-ensure-valid-old-objects-still-valid)
@@ -333,9 +333,9 @@ The migration to declarative validation is not time-sensitive. We can proceed at
333333

334334
#### Risk: Versioned validation drifts between versions.
335335

336-
##### Mitigation: round-trip testing + fuzzing + equivalence tests.
336+
##### Mitigation: round-trip testing + fuzzing + equivalence tests + linting.
337337

338-
FIXME...
338+
In order to prevent issues with versioned validation drifting between versions, we plan on using round-trip testing, fuzz testing, equivalence testing (including runtime equivalence testing with `DeclarativeValidationMismatchMetrics`) and lint rules which ensure that rules that should be synced across versions are.
339339

340340
#### Risk: Migration to Declarative Validation introduces breaking change to API validation
341341

@@ -1257,6 +1257,31 @@ No change in behavior.
12571257
12581258
###### What steps should be taken if SLOs are not being met to determine the problem?
12591259
1260+
If the API server is failing to meet SLOs (latency, validation error-rate, etc.) and Declarative Validation is suspected as a cause, operators can diagnose issues by following these steps:
1261+
1262+
1. **Gather Request-Level Details**
1263+
* Identify the failing/high-latency HTTP requests. This typically involves looking at API server logs.
1264+
* Record the verb (`CREATE` etc.), the resource type (e.g., `ReplicationController`), the namespace/name if applicable, and any relevant request parameters.
1265+
* ^ Be sure to submit this information when filing an issue (see step 5)
1266+
* Idenify the existing-resource/new-object that is causing issues. If not already known from usage, try to map/reconstruct the suspect resource from the API server logs
1267+
* ^ Be sure to submit this information when filing an issue (see step 5)
1268+
2. **Check Relevant Metrics**
1269+
* Use the `apiserver_request_duration_seconds` metric to check for differences in latency. Comparing `apiserver_request_duration_seconds` when `DeclarativeValidation` is enabled vs. disabled can reveal whether validation code generation or logic is causing performance regressions.
1270+
* If you suspect correctness mismatches, enable the `DeclarativeValidationMismatchMetrics` feature gate and monitor the `declarative_validation_mismatch` metric. Any increments in that metric indicate a situation where the new declarative validation results differ from the legacy hand-written validation for the same request.
1271+
3. **Inspect APIServer Logs**
1272+
* With `DeclarativeValidationMismatchMetrics` enabled, you can check the API server logs for entries on mismatched validation outcomes. These logs will include details about the request (the resource, version, kind, namespace/name, and user) and which fields triggered the mismatch.
1273+
* If the logs show repeated mismatches or errors for certain resource types, compare the declarative validation tags in `types.go` with the original hand-written logic to identify gaps or typos
1274+
* ^ Be sure to submit this information when filing an issue (see step 5)
1275+
4. **Compare Feature Gate Settings**
1276+
* Verify whether `DeclarativeValidation` is enabled for all API servers in an HA environment. Partial enablement can sometimes lead to inconsistent behavior or unexpected rejections.
1277+
* Temporarily disabling `DeclarativeValidation` can help isolate if new validation logic is the root cause. Bear in mind that rolling back may block updates on objects that were only valid under declarative validation rules if there is a bug related to this, so review “Can the feature be disabled once it has been enabled?” in this KEP in this case.
1278+
5. **File or Triage Issues**
1279+
* If you confirm that Declarative Validation logic is producing incorrect results or performance regressions, open a Github issue in the kubernetes/kubernetes repository. Include:
1280+
* The exact failing resource object or field that triggers errors.
1281+
* Logs, relevant metric snapshots (e.g., from `/metrics`), and your cluster’s configuration (feature gate state, etc.).
1282+
6. [optional] **Roll back** (only if absolutely necessary)
1283+
* Roll back (only if absolutely necessary) after confirming the downstream impact (see “Can the feature be disabled once it has been enabled?”).
1284+
12601285
## Implementation History
12611286
12621287
## Drawbacks

0 commit comments

Comments
 (0)