-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenAPI Validation across API types #6398
Comments
I like the idea of being more explicit in the OpenAPI spec, but TBG I have some concerns about embracing kubebuilder tags for validation because
Given the above reasons, I think we should not put this discussion in the critical path for #6383 and take some more time to figure out to preserve current test coverage while improving out OpenAPI spec |
Immutability is one important lack in current CRD OpenAPI validation. |
A defaulted field is optional from a user perspective as they don't have to specify it, with open api a required field can be defaulted and no error occurs because defaulting happens client-side before hitting the api server |
I just tested this on K8s 1.23 and I don't think that's correct. I added a required field with a default, left it out of the resource and got an error saying the field was required, if I make it optional, the default applies correctly. Seems the defaulting happens after the validation |
Is it possible to enumerate a list of the type of validations we expect to carry over in webhooks, were we to make this change?
Can you elaborate on this point? Is there conditional defaulting (or other defaulting) happening that cannot be handled by OpenAPI? Keen to know where the gaps are in the use case for Cluster API
Adding or changing the behaviour of a default for a field breaks the assumptions of clients writing and consuming this API, so it's a breaking change. As far as I understand, that doesn't change whether we have OpenAPI defaults/validation vs Webhook defaults/validation, does it? |
I think so, but it's non-trivial IMO. We'd definitely want to do this well before we'd be updating the API.
One of the ones that comes to mind is where we copy a value during defaulting:
There's also some places where we e.g. normalize versions: cluster-api/exp/api/v1beta1/machinepool_webhook.go Lines 70 to 74 in e622cbb
Would we be able to accomplish these with kubebuilder defaulting? |
Conditional stuff like this is out of scope of openapi defaulting |
@vincepri That's actually not true afaik, structural schema CRDs don't publish defaults so client-side can't know what it would be defaulted on the server. This results in a surprising UX for fields that are both required and defaulted failing client side validation if they are not set explicitly. |
Joel and further down Alberto are correct. kubectl only does OpenAPI validation but no defaulting. I opened an issue for this in k/k: kubernetes/kubernetes#108768. But in my opinion this is a mostly separate discussion. Independent of if we add more OpenAPI validation or not we will have optional and required fields like today and this will be an issue until it is resolved in k/k. Of course by adding more OpenAPI validation we could theoretically hit this case more often, but I think only in rare edge cases (where the defaulting webhook "fixes up" field values, e.g. adding the So for me it comes down to:
From a maintainer perspective I definitely prefer the validation in code as:
From a user perspective it might be better to surface the validations we could express via OpenAPI schema in the OpenAPI schema of the CRDs directly, although this doesn't provide a full picture so it might be misleading. I wonder if having a good description would be better.
A few examples of validations we cannot move to OpenAPI:
I'm aware that there is some ongoing work on an expression language for validation in CRDs which is really nice, but this would only work on newer Kubernetes versions which we cannot assume for a very long time (link: kubernetes/enhancements#2876) I think there are a lot more cases and looking through our webhook code I'm not sure we could replace a significant percentage through OpenAPI validation. |
@sbueringer thanks for taking the time to think about this, that's a very thorough write up I just wanted to add some extra context for completeness to your list of things we cannot move.
|
I think you're correct, a lot of of those should be implementable with CEL, not sure if we have something which is too complex for that.
My bad, I meant ObjectReference instead of OwnerRefs (e.g. https://github.com/kubernetes-sigs/cluster-api/blob/main/api/v1beta1/clusterclass_types.go#L445-L449 but also at least in MD, MS, Cluster). This might be solved by stop using the not-recommended ObjectReference upstream type and using our own one with only the useful fields instead. In cases were namespace must be always the same as the one of the top-level API type it doesn't make sense to have it configurable. (xref: #6024) |
Ahh this makes a lot more sense now 😅 We should definitely do that in the next API version |
I stand corrected :) Thanks folks for digging through these and getting all the information together! |
We should take a look at how/if ServerSideValidation changes the situation: kubernetes/kubernetes#108889 |
Hey there, wg-api-expression lead here, we're generally interested in building more expressive APIs, Just read this thread, a few comments:
We're generally trying to make sure people don't have to maintain a validating webhook, so I'd love to hear the use-cases that you can't get rid of. |
The trickiest cases we have rely on the existence of objects, or of fields in other objects. When we create a cluster with ClusterClass field specified we run a get from the API server to ensure the ClusterClass exists, that it has some required fields and that variable definitions in the Cluster meet some of the contstraints in the ClusterClass. Seeing as it relies on the spec of another object it seems like this would be hard to do without a webhook or something very expressive. |
Also commented on the linked k/k issue, but just to mention it here. I think this statement is not correct Thus we get inconsistent results on client- and server-side |
Agreed, but I think it's wrong that fields are both required and defaulted, we can't fix that in the api now, but we should strongly recommend not having these. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen |
/triage accepted |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
/priority important-longterm |
The Cluster API project currently lacks enough active contributors to adequately respond to all issues and PRs. @sbueringer summarized our options in #7634
I think that given the current status of the project/number of maintainers, and the fact that no one is raising concerns about the expressivity of our OpenAPI schema, only 1 is viable because it is basically the current state and we can keep improving stuff incrementally in a surgical way. I there won't be comments, I will close this issue and proceed as described above |
/close |
@fabriziopandini: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen I think we'll have to settle this debate :). Recently this came up here:
If we decide to add more OpenAPI validation including CEL:
I'll see if I can get around to it after my PTO (will basically come down to early next year) |
@sbueringer: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Fwiw other projects (like gateway API) are using CEL and relying on beta meaning that CEL is enabled in the cluster, so there's precedence for using CEL even though it could in theory be disabled in a supported version if the end user decided to |
/triage accepted I'll do some research once I get around to it and then bring this topic up for discussion |
Currently validation for our APIs is done, for the most part, in our webhooks. OpenAPI validation could be added (possibly using kubebuilder tags) to make this validation part of our OpenAPI definition. Example validations are max / min, regex matching and enum enforcement - but there's a lot of possibilities enabled by kubebuilder tags.
We currently don't use this type of validation across our API types, but we do have some examples in MachineHealthCheck, MachinePool, ClusterResourceSet and our Bootstrap types.
Putting validation in our OpenAPI specs would move implementation of validation from our webhooks to the Kubernetes tooling (kubectl and/or kube-apiserver). We would still require webhooks for a number of validations which cannot be done with OpenAPI.
Impact
From a UX perspective the change would be, in an ideal scenario, small. We wouldn't be able to customize error messages, but error messages would be more consistent. Users would be able to understand most of our validation by reading the OpenAPI spec instead of reading godoc comments (which may be out of sync with the actual implementation in the webhook).
In reality UX impact could be large and negative because of how defaulting currently works. In our current implementation webhook Defaulting happens before Validation. Users are able to leave fields blank, even though they are required in our validation, and have sane, valid defaults applied.
Currently we have the DefaultValidate test used across our API types which validates this process:
cluster-api/util/defaulting/defaulting.go
Line 37 in 09d0824
This is used to test: Machines, MachineSets, MachineHealthChecks, MachineDeployments, ClusterResourceSets, KubeadmConfigs, KubeadmConfigTemplates, MachinePools, KubeadmControlPlane, KubeadmControlPlaneTemplate.
If we move validation to OpenAPI validation will happen before webhooks are called which would require changes in how we do defaulting and which fields we expect to be defined by users.
Defaulting could alternatively, in some cases, be done in the OpenAPI schema, but the process doesn't have a good UX right now. There's an issue open about improving this in Kubernetes at kubernetes/kubernetes#108768
This can also cause rollouts in fields that previously weren't defaulted (#6095 h/t @sbueringer ) meaning that a large implementation and testing effort could be needed to ensure the changes aren't causing new rollouts.
If we end up moving only some validation and defaulting to our OpenAPI spec we end up with validation in multiple places. Which doesn't really improve the explicitness of our API but, in the worst case scenario, could be a large implementation effort and have a negative impact on UX.
Original discussion here: #6383 (review)
As the discussion is on an experimetal API we could try to implement OpenAPI validation on that API only to assess the impact and rolling it out to more parts of the overall CAPI API.
Should we try to move more validation to the OpenAPI spec?
@fabriziopandini @JoelSpeed
/kind feature
/area api
The text was updated successfully, but these errors were encountered: