Skip to content

Add admission validation for WebhookDeploymentCustomization#1465

Open
crobby wants to merge 9 commits into
rancher:mainfrom
crobby:webhook-deployment-validation
Open

Add admission validation for WebhookDeploymentCustomization#1465
crobby wants to merge 9 commits into
rancher:mainfrom
crobby:webhook-deployment-validation

Conversation

@crobby
Copy link
Copy Markdown
Collaborator

@crobby crobby commented May 13, 2026

Issue: rancher/rancher#54090

Problem

Invalid WebhookDeploymentCustomization values (zero replicas, malformed toleration keys, conflicting PDB fields) pass through the API and only fail at Helm install time on the downstream cluster, making errors difficult to diagnose.

Solution

Add admission validators for WebhookDeploymentCustomization on both provisioning.cattle.io/v1 and management.cattle.io/v3 Cluster resources:

  • replicaCount must be >= 1
  • Toleration keys validated against k8s label name rules
  • Affinity label selectors validated via apimachinery
  • PDB: must be non-negative int or 0-100% string; minAvailable and maxUnavailable cannot both be non-zero

No feature flag — always active. Includes per-resource documentation and 21 test cases per validator.

Depends on #1456.

crobby added 9 commits May 11, 2026 13:20
Remove leader election, secretHandler, ensureWebhookConfiguration, and
dynamiclistener dependency. The webhook now reads serving certs from
mounted files (/tmp/k8s-webhook-server/serving-certs/) populated by
needacert via a projected Secret volume. Cert rotation is handled by
re-reading the files on each TLS handshake. WebhookConfiguration
ownership has moved to the rancher-webhook Helm chart.
NewErrorChecker initializes with a not-ready error. The old
secretHandler.sync() cleared it; with that gone the health endpoint
returned 500 permanently. Clear the error after the serving cert is
successfully loaded, just before ListenAndServeTLS.
Full ValidatingWebhookConfiguration (31 entries) and
MutatingWebhookConfiguration (9 entries) with per-entry failurePolicy
preserved. MCM-only entries gated on .Values.mcm.enabled. Service
annotated for needacert, deployment mounts cattle-webhook-tls secret
at /tmp/k8s-webhook-server/serving-certs.
Replaces the cluster-admin ClusterRoleBinding with a scoped
ClusterRole + renamed ClusterRoleBinding (rancher-webhook-binding).
Helm will prune the old rancher-webhook CRB on upgrade since roleRef
is immutable. Enumerates built-in rke-machine types; custom node
drivers will need a chart update to add their machine resource.
ListenAndServeTLS blocks, so clients.Start (which starts informer
caches) never ran. Validators that read from caches (cluster lookups,
RBAC checks, etc.) silently returned empty results. Move the listener
into a goroutine, start clients after, and block on ctx.Done.
Validate WebhookDeploymentCustomization fields on both provisioning.cattle.io/v1
and management.cattle.io/v3 Cluster resources:
- replicaCount must be >= 1
- appendTolerations keys validated against k8s label name rules
- overrideAffinity label selectors validated
- PDB minAvailable/maxUnavailable: non-negative int or 0-100% string,
  cannot both be non-zero simultaneously
@crobby crobby marked this pull request as ready for review May 18, 2026 18:39
@crobby crobby requested a review from a team as a code owner May 18, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant