-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add webhook with cert-manager recipe #2100
Conversation
Added some upstream docs PRs that should hopefully make things a little easier for people in the future: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments, will try it out on my installation as well.
namespace: elastic-system | ||
spec: | ||
selfSigned: {} | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason why you put document end markers ("...") between the documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just got in the habit of it, I can take it out if there's a reason I'm missing
@@ -122,6 +120,11 @@ func init() { | |||
certificates.DefaultRotateBefore, | |||
"Duration representing how long before expiration TLS certificates should be reissued", | |||
) | |||
Cmd.Flags().Bool( | |||
EnableWebhooksFlag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This somewhat duplicates what we used to express with the webhook
role. I am fine with moving away from the operator roles concept but I think we should not have both the role and the explicit flag at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that is an oversight on my part, I forgot about the roles. I'll change this, we'll just want to change the default to not-webhook now I think. And change it back when we have automatic webhook configuration
// os.Exit(1) | ||
// } | ||
if viper.GetBool(EnableWebhooksFlag) { | ||
if err = (&estype.Elasticsearch{}).SetupWebhookWithManager(mgr); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only validates Elasticsearch resources but we used to have validations on (license) Secrets as well. Just flicking through the docs I think it should be possible to re-introduce them, but it might be worth revisiting the decision to validate secrets because it can potentially have a big impact on the stability of the k8s installation. Maybe worth re-introducing a license CRD?
[id="{p}-webhook"] | ||
== Validating webhook | ||
|
||
A validating webhook provides additional validation of Elasticsearch resources beyond what can be specified in OpenAPI specs. This allows advanced validation to help prevent Elasticsearch resources that are invalid and will not function correctly from being admitted by the Kubernetes API server, which can aid troubleshooting. To enable the validating webhook, you must first install cert-manager v0.11+ as described in their instructions link:from being admitted by the Kubernetes API server[here]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to cert-manager doc is missing, probably a copy/paste mistake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review, still need to do some additional tests
[id="{p}-webhook"] | ||
== Validating webhook | ||
|
||
A validating webhook provides additional validation of Elasticsearch resources beyond what can be specified in OpenAPI specs. This allows advanced validation to help prevent Elasticsearch resources that are invalid and will not function correctly from being admitted by the Kubernetes API server, which can aid troubleshooting. To enable the validating webhook, you must first install cert-manager v0.11+ as described in their instructions link:from being admitted by the Kubernetes API server[here]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link at the end of the sentence if not set
|
||
[source,yaml] | ||
---- | ||
cat $$<<$$EOF | kubectl apply -f - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$$<<$$EOF
is rendered "as is" with github, not sure why
path: /validate-elasticsearch-k8s-elastic-co-v1beta1-elasticsearch | ||
failurePolicy: Ignore | ||
name: elastic-es-validation.k8s.elastic.co | ||
timeoutSeconds: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this error when applying the ValidationWebhookConfiguration
on GKE v1.13.12-gke.8 :
error: error validating "STDIN": error validating data: ValidationError(ValidatingWebhookConfiguration.webhooks[0]): unknown field "timeoutSeconds" in io.k8s.api.admissionregistration.v1beta1.Webhook; if you choose to ignore these errors, turn validation off with --validate=false
TBH I don't understand why... 😕
It's fine on GKE 1.14, so unless someone else understand the error I have on GKE 1.13 I would remove timeoutSeconds
and keep the default (which is 30 seconds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field simply did not exist prior to 1.14 see kubernetes/api@c33fb2b and kubernetes/kubernetes#74562
@@ -0,0 +1,168 @@ | |||
[id="{p}-webhook"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not add this file in our current documentation ? Not sure I would look into the recipes
directory if I was looking for information about how to configure the webhook with the cert manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with moving it there. I think I had it here with the idea that this would be a rarely used option once we have automatic configuration enabled
|
||
=== Troubleshooting | ||
|
||
You can change the `failurePolicy` of the webhook configuration to `Fail`, which will cause creations to error out if there is an error contacting the webhook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why we have a default failurePolicy
set to Ignore
, I have some mixed feelings about this as I have been able to inject a incorrect manifest while testing certificate renewal: certificate was not up to date in the operator, got this message in ECK: http: TLS handshake error from 10.64.1.1:38142: remote error: tls: bad certificate
but the resource was created anyway.
As a side note this part is not specific to the cert-manager so I guess we should eventually include it in a more generic documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to have Fail
but that caused a lot of issues with users having trouble with the webhooks. We decided because of these issues to loosen the policy to strike a balance between giving a better experience through immediate validation and not blocking users (especially because we were also validating k8s secrets in < 1.0)
Related to #1769
The update to use cert-manager v0.11 was only merged a week ago in kubebuilder so a lot of the docs there still need some work. I also added comments in the yaml of some of the weird things I noticed in Kubebuilder about how it expects webhooks to function.
The other unfortunate part is that kubebuilder docs very much expect you to be working with their deployments + kustomize configuration, which we have not migrated to yet.
To test this you may need to run
make operator-image
and then update theimage
field with the newly built image. Once this is in master I think we can update the example to a snapshot build.