-
Notifications
You must be signed in to change notification settings - Fork 50
feat(crdvalidator): adding webhook to ensure safety of crd create/upgrades #213
feat(crdvalidator): adding webhook to ensure safety of crd create/upgrades #213
Conversation
711a12c
to
834a63c
Compare
f4c2747
to
4a84301
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work! This looks really good.
For an example idea of how an e2e could look like, check out https://github.com/operator-framework/operator-lifecycle-manager/blob/ac3aa278ddc8d59add546bf9e33e583f17d62bb0/test/e2e/crd_e2e_test.go#L115
440bf8f
to
fd9c2f7
Compare
7a4760f
to
97000b1
Compare
Dockerfile.crdvalidator
Outdated
|
||
RUN make bin/crdvalidator | ||
|
||
FROM gcr.io/distroless/static:debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the webhook need a shell, is there a reason to use the :debug
tag? Best practices for building container images are to not include a shell unless it's necessary for the application somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 It doesn't, you're right. I was mainly copying this over from the root Dockerfile. Do you think that means we should be moving that container away from having a shell as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes. I would be for it personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @exdx brings up a good point but I wonder whether it's good to use a debug tag in the short term while we continue to iterate, and then later on work towards removing it.
cmd/crdvalidator/internal/crd/crd.go
Outdated
// 3. New CRD adds a version that Old CRD does not have => | ||
// - If conversion strategy is None, ensure existing CRs validate with new schema. | ||
// - If conversion strategy is Webhook, allow update (assume webhook handles conversion correctly) | ||
func validateCRDCompatibility(ctx context.Context, cl client.Client, oldCRD *apiextensionsv1.CustomResourceDefinition, newCRD *apiextensionsv1.CustomResourceDefinition) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: all the code in this function and following it is fairly in-depth, and could use some unit tests as a follow-up. Since it came from elsewhere, largely from OLM, it's fairly safe to rely on, but we could test it more on the rukpak side as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I'll make a follow-up item for this PR that adds unit testing for this particular package.
97000b1
to
f0fb36f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for all the work on this. Let's get one more approval before merging.
f0fb36f
to
bf174ef
Compare
It("should deny admission", func() { | ||
Eventually(func() bool { | ||
if err := c.Get(ctx, client.ObjectKeyFromObject(crd), crd); err != nil { | ||
return false | ||
} | ||
|
||
crd.Spec.Versions = []apiextensionsv1.CustomResourceDefinitionVersion{ | ||
{ | ||
Name: "v1alpha2", | ||
Served: true, | ||
Storage: true, | ||
Schema: &apiextensionsv1.CustomResourceValidation{ | ||
OpenAPIV3Schema: &apiextensionsv1.JSONSchemaProps{ | ||
Type: "object", | ||
Description: "my crd schema", | ||
}, | ||
}, | ||
}, | ||
} | ||
|
||
err := c.Create(ctx, crd) | ||
if err == nil { | ||
return false | ||
} | ||
|
||
return errorsLooselyEqual(err, errors.New("cannot remove stored versions")) | ||
}).Should(BeTrue()) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things:
- I'm slightly confused whether we should be firing off an update request vs. a create one
- Can we return an error here and use gomega matchers to enforce the expected error message, e.g.
rukpak/test/e2e/plain_provisioner_test.go
Line 838 in bf174ef
ContainSubstring(`no matches for kind "CatalogSource" in version "operators.coreos.com/v1alpha1"`),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so the Update
here was attempted and continuously failed. If you have the Create
it succeeds - however, this is not what we should really be expecting and just obfuscated the issue. Something under-the-hood in Kube is now blocking unsafe storage removals and thus the error that we are expecting is not coming through. This is because the Update
request does not seem to even hit the webhook and Kube returns its own error. I'm making some changes in an upcoming commit along with a todo
comment to address this in the future by reevaluating if we need to ensure safe storage removals or if Kube does it by default now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I just did a quick passthrough of these changes. I think my main takeaways were the following:
- I'm not a huge fan of the current directory structure as it's difficult to discover where packages live in what directories and we don't have a concrete want/need to split this kind of component into it's own repository, or attempt to introduce it to upstream k/k.
- The
make run
command doesn't install this component like I would've expected - It's not entirely evident how we could disable this behavior dynamically (or whether that's something we'd need to support)
- We can punt on the debug/static distroless tag discussion
- I think we can drop that k8s.io/apiserver dependency that was introduced with these changes
- We're potentially introducing a confusing installation UX by having a deploy and install Makefile targets
- It doesn't look like the CRD validator e2e test Makefile target loads the kind-load-crdvalidator target unless I'm reading these changes wrong
- We'll need to update goreleaser to build this new executable package
I also ran these changes locally and ran into e2e failures. I haven't dived too deep into those yet though.
FYI - It looks like I ran into a container image build warning locally too:
|
Yep, so I recently rebased this PR and it removed my addition to |
410d954
to
084afe5
Compare
65b008a
to
58b3f57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're super close and then I'm ready to approve:
- Can we remove that test-crdvalidator-e2e target which appears to be unused?
- Can we remove that unused Dockerfile.crdvalidator now that we've consolidated this component in the current repository structure?
5d5a148
to
06bb7bd
Compare
Note: it looks like this branch references out-of-date files compared to the main branch. It might be worth rebasing against the main branch, and respinning e2e once more to make sense we're not potentially silently regressing, but I'll leave that up to you as I just approved this. |
Really nice work btw @tylerslaton |
06bb7bd
to
adfa290
Compare
Rebased and pushed. Looks like everything is good to go. Going to squash all the commits together. @exdx anything you want to touch on before we merge this? |
Add a new webhook that integrates the CRD validation logic into a webhook. With this enabled on cluster, bundle installs and pivots can be guranteed as safe by the webhook. It works by intercepting all CRD create/update requests that come through and declining or admitting them based on the internal/crd package. Signed-off-by: Tyler Slaton <tyslaton@redhat.com>
adfa290
to
3ea0a0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just some small comments.
install: install-plain ## Install all rukpak core CRDs and provisioners | ||
|
||
deploy: install-apis ## Deploy the operator to the current cluster | ||
deploy: install-apis install-crdvalidator ## Deploy the operator to the current cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: install-apis
already calls cert-mgr
so we end up installing cert-manager twice on deploy. I'm not sure a good way around this -- is there a way to only run a target if it hasn't already been run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking on this. I haven't come up with a great solution other than having all the cert-mgr dependent installs in a single target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable, we can have something like install-webhooks
which calls cert-mgr
under the hood. I suppose we can address this is in a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, but I agree with you. Its an easy addition that I can just tack onto the follow-ups.
// it through. This is fine and ultimately what the safe storage logic of the | ||
// CRDValidator was designed to prevent but is unknown why it is occurring. We | ||
// Should come back to this test case, figure out what is preventing it from | ||
// hitting the webhook and decide if we want to keep that logic or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we open an issue to track this and put the link to it here in the test comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this action will create a issue for it:
https://github.com/operator-framework/rukpak/blob/main/.github/workflows/todo.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CRDValidator was designed to prevent but is unknown why it is occurring. We | ||
// Should come back to this test case, figure out what is preventing it from | ||
// hitting the webhook and decide if we want to keep that logic or not. | ||
PWhen("an incoming crd event removes a stored version", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we had some issues getting this test to work -- but we did confirm the test case where the schema changed in an incompatible way between upgrades. It's worth adding that test so we have at least one scenario where the admission was denied due to the webhook in our tests. This can be added in a follow-up though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - good call out. I'm also cool with a follow-up here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I'm planning on doing a follow-up PR like the recent git work after this lands.
LGTM again - do you want to push this through @tylerslaton. |
@timflannagan Yep, I'm good with this going through and addressing the remaining comments in a follow up. Merging now. |
NOTES
Summary
Doing a few things here that we should talk through:
crdvalidator
) under thecmd
directory. There has been talk of moving this webhook into its own repo in the future and this should make that very easy.crd
util package along side the webhook code.Open Questions
crd
applies through or would that option just be removing this webhook?