-
Notifications
You must be signed in to change notification settings - Fork 385
Consul fails to start #10
Comments
Does it never stabilize? It looks like its starting to work, just hasn't joined all the servers yet. Also, PVCs might be a real issue for sure. I didn't realize actually that StatefulSets with PVCs can be started before the PVC is available (spoiled in the environment we run in I guess). Do you know what that looks like? Is the directory just not available yet? That might be something we have to build into an init container or something (to wait for it to be ready). |
@mitchellh no, it never does, funnily it used to work with the previous release of rook (v0.7). In my understanding the StatefulSet starts and attempts to bind the PVCs, until that is done, the pod should report an unbound pvc issue - should be easily accessbile via |
But during that time, the containers are started? Sorry, easiest way to figure this out would be if you did more digging or I can get a reproduction. For the latter, is there an easy way for me to get a similar environment up and running? |
They appear to be - the logs are there. I'm happy to do more digging, however in order for you to get a repro, I'd have to provide you with terraform files, helm value files & k8s manifests to get a copy of my env going OR I'll simply share a kubeconfig file so that you can poke around and leave it running for the night |
@mitchellh I've cloned the repo and made some alterations: server-statefulset.yaml Results
helm status consul
LAST DEPLOYED: Wed Sep 26 23:34:25 2018
NAMESPACE: service-discovery
STATUS: DEPLOYED
RESOURCES:
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
consul-kktl5 1/1 Running 0 2m
consul-tbjmt 1/1 Running 0 2m
consul-x5cqq 1/1 Running 0 2m
consul-server-0 1/1 Running 0 2m
consul-server-1 1/1 Running 0 2m
consul-server-2 1/1 Running 0 2m
==> v1/ConfigMap
NAME DATA AGE
consul-client-config 1 2m
consul-server-config 1 2m
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul-dns ClusterIP 10.3.228.30 <none> 53/TCP,53/UDP 2m
consul-server ClusterIP None <none> 8500/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP 2m
consul-ui ClusterIP 10.3.98.102 <none> 80/TCP 2m
==> v1/DaemonSet
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
consul 3 3 3 3 3 <none> 2m
==> v1/StatefulSet
NAME DESIRED CURRENT AGE
consul-server 3 3 2m
==> v1beta1/PodDisruptionBudget
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
consul-server N/A 0 0 2m Compare to previous resultsIf you compare the previous results, the NAME READY STATUS RESTARTS AGE
consul-56lvs 0/1 Running 0 36s
consul-jttwp 0/1 Running 0 36s
consul-qpgdn 0/1 Running 0 36s
consul-server-0 0/1 ContainerCreating 0 36s
consul-server-1 0/1 ContainerCreating 0 36s
consul-server-2 0/1 Running 0 36s Hypothesis (FailureTreshold?)This is what the docs say:
The server-statefulset.yaml failure treshold is: Meaning that kubernetes started to fail the checks and gave up before the pvc(s) were bound to the pods? |
@mmisztal1980 I made the adjustments that you mentioned, but am still seeing pod failing health checks and servers sitting in a pending state.
Consul logs seem to indicate that the stateful set isn't binding the PVC. I am also using Rook/Ceph for storage. pod description
I checked the claims (kubectl get pvc) but don't see any claims made my Consul showing up in there. |
Ugh it looks like I was on an outdated version of the repo. After updating the repo and reinstalling it is working now. |
Added a PR where the probe settings are configurable via the chart values. That should help in our case. |
@mitchellh would the above PR be satisfactory? Tweaking the probe settings seems to have fixed the issue for myself and @jmreicha |
In my cluster I have rook running. When provisioning the consul cluster, my helm values file looks like this:
In order to start the chart, I use the following cmdline:
A quick verification of pvc indicate that they have bound successfully, please note that I've observed that it takes up to 20s to bind the pvc(s) running under rook
However the consul-server pods have failed to start:
An examination of the pod indicates that the readiness probe has failed:
An examination of the server logs indicates that it has failed to form the cluster:
Any hints what may be wrong?
I've noticed that the probe's
initialDelaySeconds
default value is 5, so I'm guessing it may have failed before the pvcs have been bound? Perhaps it'd make sense to have this value configurable?The text was updated successfully, but these errors were encountered: