-
Notifications
You must be signed in to change notification settings - Fork 385
Conversation
@@ -52,6 +52,35 @@ global: | |||
# consul-k8s v0.8.0+. | |||
bootstrapACLs: false | |||
|
|||
# Enables TLS encryption across the cluster to verify authenticity of the | |||
# servers and clients that connect. Note: It is HIGHLY recommended that you also | |||
# enable Gossip encryption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really link to our own docs page on enabling gossip encryption... if we had one.
-H "Authorization: Bearer $( cat /var/run/secrets/kubernetes.io/serviceaccount/token )" \ | ||
-H "Content-Type: application/json" \ | ||
-H "Accept: application/json" \ | ||
-d "{ \"kind\": \"Secret\", \"apiVersion\": \"v1\", \"metadata\": { \"name\": \"{{ template "consul.fullname" . }}-server-cert\", \"namespace\": \"${NAMESPACE}\" }, \"type\": \"kubernetes.io/tls\", \"data\": { \"tls.crt\": \"$( cat {{ .Values.global.datacenter }}-server-{{ .Values.global.domain }}-0.pem | base64 | tr -d '\n' )\", \"tls.key\": \"$( cat {{ .Values.global.datacenter }}-server-{{ .Values.global.domain }}-0-key.pem | base64 | tr -d '\n' )\" } }" > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one cert for all the servers right? So isn't this weird it's suffixed with -0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what consul CLI does by default. It doesn't matter so much though because we're uploading only the contents here.
It's maybe weird for client certs. Each client cert has the -0 suffix, and so it could be a bit confusing. We could rename them, but with auto_encrypt we won't need this anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, yeah for servers this is only place we see -0
since the key is tls.crt
so this is cool
This PR requires |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to look at
- UI
- how it interacts with ACLs
- how we might update the certs
but thought it best to add this feedback so far
CHANGELOG.md
Outdated
@@ -11,6 +11,15 @@ IMPROVEMENTS: | |||
false in your local values file. NOTE: If `connectInject.enabled` is false, | |||
then central config is not enabled so this change will not affect you. | |||
|
|||
* Optionally allow enabling TLS for servers and clients [[GH-313](https://github.com/hashicorp/consul-helm/pull/313/files#)]. | |||
|
|||
Note that consul-k8s components don't currently work with HTTPS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you fix the injector you should update that Consul connect won't work at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix the injector before this PR gets merged.
templates/client-clusterrole.yaml
Outdated
resources: | ||
- secrets | ||
resourceNames: | ||
- {{ template "consul.fullname" . }}-ca-cert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍻 for template
Outstanding question from me: How do users rotate their certificates? |
c04993a
to
12ea51b
Compare
@lkysow I've made the updates from your review and also added support for some consul-k8s components.
Outstanding work that will be submitted separately:
|
Testing this branch in one of my clusters and I ran in to an issue that may be related to these changes. Upgrading from consul-helm version 0.8.1 results in: I opted to manually create a service account which allowed I finally just blew away my existing consul installation and ran this chart clean. I will say that running it on a clean environment worked great and the Golang Hashicorp Consul api with TlsConfig plugged in seamlessly so thank you for the work on this! Will be watching closely for using my own CA. |
@tpetrychyn thank you so much for trying it out and leaving feedback here. Really appreciate it! It looks like I forgot to mark the |
Encountered another issue that I've been banging my head against for days.. Using commit 12ea51b I am now getting the error in consul-sync-catalog:
I am using Digital Ocean managed k8 v1.16.2-do.1, my CNI is cilium v1.6.4. Using v0.8.1 of this chart works fine in this environment. The oddest thing is 2 weeks I setup a completely clean DO cluster on 1.16.2-do.1 and this chart worked out of the box, but now even a clean cluster made today has that error.. P.S. commit ^6b659a2 breaks because consul-k8s 0.11.0 is not published on dockerhub |
@tpetrychyn This PR requires a new version of consul-k8s for many components, such as Connect Inject and ACL bootstrapping, to work correctly. I optimistically updated the version. This PR, however, should not be merged without that docker image being published first. Sorry if that was confusing! Have you tried deploying it with TLS disabled? Do you see the same error? At a first glance, it doesn't seem TLS related since the TLS handshake happens after the TCP connection is established. |
Thanks for the clarification! Non-tls also has that issue. Setting |
# Note: remember to switch it back to true once the rollout is complete. | ||
# Please see this guide for more details: | ||
# https://learn.hashicorp.com/consul/security-networking/certificates | ||
verify: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lkysow this flag is for incremental TLS rollout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adilyse ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! A couple things came up. Here's what I've tested so far.
- review new verify: setting
- components (with httpsonly)
- ACL bootstrapping
- connect
- sync catalog
- ent license
- snapshot
- mesh gateway (just that they came up)
- consul UI
- consul DNS
- pod security polices (does not work)
- gradual deployment
- upgrades
templates/client-daemonset.yaml
Outdated
@@ -232,6 +280,38 @@ spec: | |||
- name: aclconfig | |||
mountPath: /consul/aclconfig | |||
{{- end }} | |||
{{- if .Values.global.tls.enabled }} | |||
- name: client-tls-init | |||
image: {{ .Values.global.image }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is very unlikely to make a difference but I think we should use
image: "{{ default .Values.global.image .Values.client.image }}"
here to keep it consistent with the image for the main container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! thanks for calling this out. I'll update.
-ca-file=/consul/tls/ca/tls.crt \ | ||
-http-addr="https://${HOST_IP}:8501" | ||
{{- else }} | ||
-http-addr="${HOST_IP}:8500" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed http://
from this. Makes sense to keep it I think so it's clear.
@@ -79,8 +83,13 @@ spec: | |||
{{- if .Values.global.bootstrapACLs}} | |||
-config-dir=/consul/aclconfig \ | |||
{{- end }} | |||
-http-addr=http://${HOST_IP}:8500 | |||
{{- if (or .Values.global.bootstrapACLs (and .Values.client.snapshotAgent.configSecret.secretName .Values.client.snapshotAgent.configSecret.secretKey) ) }} | |||
{{- if .Values.global.tls.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in other places we're setting the CONSUL_HTTP_ADDR and CONSUL_CACERT env vars instead of using flags
templates/server-clusterrole.yaml
Outdated
resources: ["podsecuritypolicies"] | ||
resourceNames: | ||
- {{ template "consul.fullname" . }}-server | ||
verbs: | ||
- use | ||
{{- else }} | ||
{{- else}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭
CHANGELOG.md
Outdated
@@ -1,5 +1,15 @@ | |||
## Unreleased | |||
|
|||
IMPROVEMENTS: | |||
|
|||
* Optionally allow enabling TLS [[GH-313](https://github.com/hashicorp/consul-helm/pull/313/files#)]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Optionally allow enabling TLS [[GH-313](https://github.com/hashicorp/consul-helm/pull/313/files#)]. | |
* Optionally allow enabling TLS [[GH-313](https://github.com/hashicorp/consul-helm/pull/313)] for Consul communication. |
templates/server-clusterrole.yaml
Outdated
@@ -10,13 +10,13 @@ metadata: | |||
release: {{ .Release.Name }} | |||
{{- if .Values.global.enablePodSecurityPolicies }} | |||
rules: | |||
- apiGroups: ["policy"] | |||
- apiGroups: ["policy"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this breaks this template. Either unindent or indent everything nested below
{{- end }} | ||
-dc={{ .Values.global.datacenter }} \ | ||
-domain={{ .Values.global.domain }} | ||
curl -s -X POST --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here these POSTs this won't replace any existing secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -32,6 +32,12 @@ spec: | |||
spec: | |||
restartPolicy: Never | |||
serviceAccountName: {{ template "consul.fullname" . }}-server-acl-init | |||
{{- if .Values.global.tls.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs pod security policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the PSP for this job you've created in #326, I don't think anything will need to change once that's merged. This PR only adds secrets, and the PSP already allows attaching secret volumes. But let me know if I misunderstood you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we're good now with my PR. That was just a reminder that we needed to do it for acl init.
@@ -0,0 +1,52 @@ | |||
# tls-init-cleanup job deletes Kubernetes secrets created by tls-init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs pod security policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
@@ -166,17 +207,17 @@ spec: | |||
- name: aclconfig | |||
mountPath: /consul/aclconfig | |||
{{- end }} | |||
lifecycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not something that needs to be changed but we might be adding this back as consul force-leave -prune
due to hashicorp/consul#6897 (comment)
This commit exposes TLS configuration in the Helm chart and makes the following changes: * Optionally allows you to turn on TLS by setting global.tls.enabled to true. * Exposes the default HTTPS port 8501 on clients and servers if TLS enabled. * Optionally allows you to disable HTTP ports on clients and servers by setting global.tls.httpsOnly to true. This is recommended, however, disabling HTTP will break existing consul-k8s components, which currently do not support TLS. * Adds TLS bootstrapping job that generates Consul cluster CA, as well as server and client certificates signed by that CA. * Exposes UI service on port 443 if TLS is enabled.
* Rename tls-secret job to tls-init. * Ensure principle of least privilege for the tls-init-clusterrole * Set httpsOnly to false by default * Client certs are generated in an init container, so that we can add HOST_IP as a SAN to enable other clients, e.g. sync-catalog, to talk to it * Update pod security policies to account for HTTPS ports exposed as as hostPort if TLS is enabled * Don't expose various verify_* config options in the Helm chart and, instead, assume secure defaults if global.tls.enabled is true. * Ensure only the CA certificate file is present on the filesystem of the server and client containers. * Don't run tls-init job if servers are disabled * Add tls-init-cleanup job that deletes certs created by the tls-init job when the Helm chart is deleted. * Add missing tests. * Use Consul CLI to set enterprise license * Don't generate CLI clients certs because we are not verifying incoming for HTTP requests (ACLs are recommended).
* Remove Helm hooks annotations for RBAC resources. They are not needed because resources are already managed correctly by Helm.
* separate CA cert and key into their own secrets * remove duplicate/redundant tests * increase server certificate validity to 2 years * add server service DNS names as SANs to the server certificate * set env variables for consul server and clients pods to be able to easily talk to consul when 'exec'ing into the container * fix graceful termination for the servers (previously terminationGracePeriod was set to 10sec, which wasn't enough times for the servers to terminate).
…d be pre-upgrade hooks
* Add pod security policies * use http everywhere explicitly for CONSUL_HTTP_ADDR * Fix templating issue in server-clusterrole.yaml
@lkysow I've made the changes you requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔐 🚀
Be sure to do a release of consul-k8s first
{{- if .Values.connectInject.centralConfig.enabled }} | ||
-enable-central-config=true \ | ||
{{- end }} | ||
{{- if (and .Values.connectInject.centralConfig.enabled .Values.connectInject.centralConfig.defaultProtocol) }} | ||
-default-protocol="{{ .Values.connectInject.centralConfig.defaultProtocol }}" \ | ||
{{- end }} | ||
{{- if .Values.connectInject.certs.secretName }} | ||
{{- if .Values.connectInject.certs.secretName }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ishustava does the generated envoy config file include the TLS context (cert/key)? I see that the CA is added inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The envoy TLS context for the consul client (aka local_agent
) doesn't need cert/key since it's not using mTLS client auth to talk to Consul. That's why it only has the CA cert.
If your question is about authentication, we recommend using ACLs for production environments for that purpose, and when they are enabled in the Helm chart, Envoy will include an ACL token when talking to Consul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I was more curious about mTLS for service to service (via envoy) communication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. mTLS for service to service is configured dynamically as you add and remove services by the Consul client agent. These docs talk a bit about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the dynamically generated envoy-bootstrap.yaml file via the consul connect envoy -bootrap command? Is that where the cert/key should appear?
Specifically, do I expect something like this in the generated file:
tls_context:
common_tls_context:
tls_certificates:
- certificate_chain:
filename: "/etc/example-com.crt"
private_key:
filename: "/etc/example-com.key"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that config is used only for bootstrapping envoy itself. Consul updates envoy config for services dynamically through the API, so you won't find them in a file. You could look at the config though through the envoy admin endpoint. All you need to do is this:
kubectl port-forward po/<your pod that has envoy sidecar container> 19000
Then go to the browser at http://localhost:19000. There you will see an option to dump the config. Alternatively, you could just curl the endpoint to get the config curl http://localhost:19000/config_dump
You could do other things through that endpoint too, like change log levels. Here are the docs for the admin endpoint in case you'll find them useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! I appreciate your help. Thank you! This is awesome!
This PR builds on the work done in #137 and proposes the following changes:
global.tls.enabled
totrue
.global.tls.httpsOnly
totrue
. This is set totrue
by default.tls-init-cleanup
job that deletes certs created by thetls-init
job when the Helm chart is deleted.HOST_IP
as a SAN to enable other clients, e.g. sync-catalog, to talk to it. This will be replaced byauto_encrypt
in the future once it supports providing additional SANs to the client certificate.verify_*
config options in the Helm chart as it was proposed in the original PR Support TLS encryption #137. Instead, it assumes secure defaults ifglobal.tls.enabled
istrue
.Note 1: This currently doesn't support external CAs. It also doesn't work if servers aren't deployed on k8s. Once we support external CAs, we can also support various deployment configurations. Hence, if servers are disabled,
tls-init
will not run.Note 2: We're not using mTLS for HTTP API. That is because in Consul's threat model we recommend using ACLs instead. To prevent clients from sending unencrypted RPC requests to the servers, we set
verify_incoming_rpc
totrue
.Fixes #136