Skip to content

Remove webhook validation code #1590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 1 addition & 39 deletions design/resource-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,36 +91,9 @@ Design a validation mechanism for Gateway API resources.

## Design

We will introduce two validation methods to be run by NGF control plane:

1. Re-run of the Gateway API webhook validation
2. NGF-specific field validation

### Re-run of Webhook Validation

Before processing a resource, NGF will validate it using the functions from
the [validation package](https://github.com/kubernetes-sigs/gateway-api/tree/fa4b0a519b30a33b205ac0256876afc1456f2dd3/apis/v1/validation)
from the Gateway API. This will ensure that the webhook validation cannot be bypassed (it can be bypassed if the webhook
is not installed, misconfigured, or running a different version), and it will allow us to avoid repeating the same
validation in our code.

If a resource is invalid:

- NGF will not process it -- it will treat it as if the resource didn't exist. This also means that if the resource was
updated from a valid to an invalid state, NGF will also ignore any previous valid state. For example, it will remove
the generation configuration for an HTTPRoute resource.
- NGF will report the validation error as a
Warning [Event](https://kubernetes.io/docs/reference/kubernetes-api/cluster-resources/event-v1/)
for that resource. The Event message will describe the error and explain that the resource was ignored. We chose to
report an Event instead of updating the status, because to update the status, NGF first needs to look inside the
resource to determine whether it belongs to it or not. However, since the webhook validation applies to all parts of
the spec of resource, it means NGF has to look inside the invalid resource and parse potentially invalid parts. To
avoid that, NGF will report an Event. The owner of the resource will be able to see the Event.
- NGF will also report the validation error in the NGF logs.

### NGF-specific validation

After re-running the webhook validation, NGF will run NGF-specific validation written in go.
NGF runs NGF-specific validation written in go.

NGF-specific validation will:

Expand All @@ -132,7 +105,6 @@ NGF-specific validation will not include:

- *All* validation done by CRDs. NGF will only repeat the validation that addresses (1) and (2) in the list above with
extra rules required by NGINX but missing in the CRDs. For example, NGF will not ensure the limits of field values.
- The validation done by the webhook (because it is done in the previous step).

If a resource is invalid, NGF will report the error in its status.

Expand All @@ -146,7 +118,6 @@ following methods in order of their appearance in the table.
| Name | Type | Component | Scope | Feedback loop for errors | Can be bypassed? |
|------------------------------|----------------------------|-----------------------|-------------------------|----------------------------------------------------------------------------------|--------------------------------|
| CRD validation | OpenAPI and CEL validation | Kubernetes API server | Structure, field values | Kubernetes API server returns any errors a response for an API call. | Yes, if the CRDs are modified. |
| Re-run of webhook validation | Go code | NGF control plane | Field values | Errors are reported as Event for the resource. | No |
| NGF-specific validation | Go code | NGF control plane | Field values | Errors are reported in the status of a resource after its creation/modification. | No |


Expand All @@ -156,7 +127,6 @@ following methods in order of their appearance in the table.
|------------------------------|---------|-----------------------|-------------------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| CRD validation | OpenAPI | Kubernetes API server | Structure, field values | Kubernetes API server returns any errors a response for an API call. | Yes, if the CRDs are modified. |
| Webhook validation | Go code | Gateway API webhook | Field values | Kubernetes API server returns any errors a response for an API call. | Yes, if the webhook is not installed, misconfigured, or running a different version. |
| Re-run of webhook validation | Go code | NGF control plane | Field values | Errors are reported as Event for the resource. | No |
| NGF-specific validation | Go code | NGF control plane | Field values | Errors are reported in the status of a resource after its creation/modification. | No |


Expand All @@ -182,14 +152,6 @@ We will not introduce any NGF webhook in the cluster (it adds operational comple
source of potential downtime -- a webhook failure disables CRUD operations on the relevant resources) unless we find
good reasons for that.

### Upgrades

Since NGF will use the validation package from the Gateway API project, when a new release happens, we will need to
upgrade the dependency and release a new version of NGF, provided that the validation code changed. However, if it did
not change, we do not need to release a new version. Note: other things from a new Gateway API release might prompt us
to release a new version like supporting a new field. See also
[GEP-922](https://gateway-api.sigs.k8s.io/geps/gep-922/#).

### Reliability

NGF processes two kinds of transactions:
Expand Down
37 changes: 1 addition & 36 deletions internal/mode/static/state/change_processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ import (
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/apiutil"
v1 "sigs.k8s.io/gateway-api/apis/v1"
gwapivalidation "sigs.k8s.io/gateway-api/apis/v1/validation"
"sigs.k8s.io/gateway-api/apis/v1alpha2"
"sigs.k8s.io/gateway-api/apis/v1beta1"

Expand All @@ -25,13 +24,6 @@ import (
"github.com/nginxinc/nginx-gateway-fabric/internal/mode/static/state/validation"
)

const (
validationErrorLogMsg = "the resource failed validation: Gateway API CEL validation (Kubernetes 1.25+) " +
"by the Kubernetes API server and/or the Gateway API webhook validation (if installed) failed to reject " +
"the resource with the error; make sure Gateway API CRDs include CEL validation and/or (if installed) the " +
"webhook is running correctly."
)

// ChangeType is the type of change that occurred based on a k8s object event.
type ChangeType int

Expand Down Expand Up @@ -193,35 +185,8 @@ func NewChangeProcessorImpl(cfg ChangeProcessorConfig) *ChangeProcessorImpl {
},
)

updater := newValidatingUpsertUpdater(
trackingUpdater,
cfg.EventRecorder,
func(obj client.Object) error {
// Add the validation for Gateway API resources which the webhook validates

var err error
switch o := obj.(type) {
// We don't validate GatewayClass or ReferenceGrant, because as of the latest version,
// the webhook doesn't validate them.
// It only validates a GatewayClass update that requires the previous version of the resource,
// which NGF cannot reliably provide - for example, after NGF restarts).
// https://github.com/kubernetes-sigs/gateway-api/blob/v1.0.0/apis/v1/validation/gatewayclass.go#L28
case *v1.Gateway:
err = gwapivalidation.ValidateGateway(o).ToAggregate()
case *v1.HTTPRoute:
err = gwapivalidation.ValidateHTTPRoute(o).ToAggregate()
}

if err != nil {
return fmt.Errorf(validationErrorLogMsg+": %w", err)
}

return nil
},
)

processor.getAndResetClusterStateChanged = trackingUpdater.getAndResetChangedStatus
processor.updater = updater
processor.updater = trackingUpdater

return processor
}
Expand Down
Loading