Improve CockroachDB example

* Use an init container to eliminate potential edge case where losing the first pet's could cause it to start a second logical cluster * Exec the cockroach binary so that it runs as PID 1 in the container * Make some small improvements to the README
pospispa · Oct 31, 2016 · 6b98de3 · 6b98de3
1 parent e6b2517
commit 6b98de3
Show file tree

Hide file tree

Showing 3 changed files with 108 additions and 39 deletions.
diff --git a/examples/cockroachdb/README.md b/examples/cockroachdb/README.md
@@ -12,10 +12,11 @@ a PetSet. CockroachDB is a distributed, scalable NewSQL database. Please see
 Standard PetSet limitations apply: There is currently no possibility to use
 node-local storage (outside of single-node tests), and so there is likely
 a performance hit associated with running CockroachDB on some external storage.
-Note that CockroachDB already does replication and thus should not be deployed on
-a persistent volume which already replicates internally.
-High-performance use cases on a private Kubernetes cluster should consider
-a DaemonSet deployment.
+Note that CockroachDB already does replication and thus it is unnecessary to
+deploy it onto persistent volumes which already replicate internally.
+For this reason, high-performance use cases on a private Kubernetes cluster
+may want to consider a DaemonSet deployment until PetSets support node-local
+storage (see #7562).
 
 ### Recovery after persistent storage failure
 
@@ -27,17 +28,25 @@ first node is special in that the administrator must manually prepopulate the
 parameter. If this is not done, the first node will bootstrap a new cluster,
 which will lead to a lot of trouble.
 
-### Dynamic provisioning
+### Dynamic volume provisioning
 
-The deployment is written for a use case in which dynamic provisioning is
+The deployment is written for a use case in which dynamic volume provisioning is
 available. When that is not the case, the persistent volume claims need
 to be created manually. See [minikube.sh](minikube.sh) for the necessary
-steps.
+steps. If you're on GCE or AWS, where dynamic provisioning is supported, no
+manual work is needed to create the persistent volumes.
 
 ## Testing locally on minikube
 
 Follow the steps in [minikube.sh](minikube.sh) (or simply run that file).
 
+## Testing in the cloud on GCE or AWS
+
+Once you have a Kubernetes cluster running, just run
+`kubectl create -f cockroachdb-petset.yaml` to create your cockroachdb cluster.
+This works because GCE and AWS support dynamic volume provisioning by default,
+so persistent volumes will be created for the CockroachDB pods as needed.
+
 ## Accessing the database
 
 Along with our PetSet configuration, we expose a standard Kubernetes service
@@ -48,15 +57,27 @@ Start up a client pod and open up an interactive, (mostly) Postgres-flavor
 SQL shell using:
 
 ```console
-$ kubectl run -it cockroach-client --image=cockroachdb/cockroach --restart=Never --command -- bash
-root@cockroach-client # ./cockroach sql --host cockroachdb-public
+$ kubectl run -it --rm cockroach-client --image=cockroachdb/cockroach --restart=Never --command -- ./cockroach sql --host cockroachdb-public
 ```
 
 You can see example SQL statements for inserting and querying data in the
 included [demo script](demo.sh), but can use almost any Postgres-style SQL
 commands. Some more basic examples can be found within
 [CockroachDB's documentation](https://www.cockroachlabs.com/docs/learn-cockroachdb-sql.html).
 
+## Accessing the admin UI
+
+If you want to see information about how the cluster is doing, you can try
+pulling up the CockroachDB admin UI by port-forwarding from your local machine
+to one of the pods:
+
+```shell
+kubectl port-forward cockroachdb-0 8080
+```
+
+Once you’ve done that, you should be able to access the admin UI by visiting
+http://localhost:8080/ in your web browser.
+
 ## Simulating failures
 
 When all (or enough) nodes are up, simulate a failure like this:
@@ -77,10 +98,17 @@ database and ensuring the other replicas have all data that was written.
 
 ## Scaling up or down
 
-Simply edit the PetSet (but note that you may need to create a new persistent
-volume claim first). If you ran `minikube.sh`, there's a spare volume so you
-can immediately scale up by one. Convince yourself that the new node
-immediately serves reads and writes.
+Simply patch the PetSet by running
+
+```shell
+kubectl patch petset cockroachdb -p '{"spec":{"replicas":4}}'
+```
+
+Note that you may need to create a new persistent volume claim first. If you
+ran `minikube.sh`, there's a spare volume so you can immediately scale up by
+one. If you're running on GCE or AWS, you can scale up by as many as you want
+because new volumes will automatically be created for you. Convince yourself
+that the new node immediately serves reads and writes.
 
 ## Cleaning up when you're done
 

diff --git a/examples/cockroachdb/cockroachdb-petset.yaml b/examples/cockroachdb/cockroachdb-petset.yaml
@@ -23,17 +23,25 @@ spec:
 apiVersion: v1
 kind: Service
 metadata:
+  # This service only exists to create DNS entries for each pet in the petset
+  # such that they can resolve each other's IP addresses. It does not create a
+  # load-balanced ClusterIP and should not be used directly by clients in most
+  # circumstances.
+  name: cockroachdb
+  labels:
+    app: cockroachdb
   annotations:
+    # This is needed to make the peer-finder work properly and to help avoid
+    # edge cases where instance 0 comes up after losing its data and needs to
+    # decide whether it should create a new cluster or try to join an existing
+    # one. If it creates a new cluster when it should have joined an existing
+    # one, we'd end up with two separate clusters listening at the same service
+    # endpoint, which would be very bad.
+    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
     # Enable automatic monitoring of all instances when Prometheus is running in the cluster.
     prometheus.io/scrape: "true"
     prometheus.io/path: "_status/vars"
     prometheus.io/port: "8080"
-  # This service only exists to create DNS entries for each pet in the petset such that they can resolve
-  # each other's IP addresses. It does not create a load-balanced ClusterIP and should not be used
-  # directly by clients in most circumstances.
-  name: cockroachdb
-  labels:
-    app: cockroachdb
 spec:
   ports:
   - port: 26257
@@ -52,13 +60,50 @@ metadata:
   name: cockroachdb
 spec:
   serviceName: "cockroachdb"
-  replicas: 5
+  replicas: 3
   template:
     metadata:
       labels:
         app: cockroachdb
       annotations:
         pod.alpha.kubernetes.io/initialized: "true"
+        # Init containers are run only once in the lifetime of a pod, before
+        # it's started up for the first time. It has to exit successfully
+        # before the pod's main containers are allowed to start.
+        # This particular init container does a DNS lookup for other pods in
+        # the petset to help determine whether or not a cluster already exists.
+        # If any other pets exist, it creates a file in the cockroach-data
+        # directory to pass that information along to the primary container that
+        # has to decide what command-line flags to use when starting CockroachDB.
+        # This only matters when a pod's persistent volume is empty - if it has
+        # data from a previous execution, that data will always be used.
+        pod.alpha.kubernetes.io/init-containers: '[
+            {
+                "name": "bootstrap",
+                "image": "cockroachdb/cockroach-k8s-init:0.1",
+                "args": [
+                  "-on-start=/on-start.sh",
+                  "-service=cockroachdb"
+                ],
+                "env": [
+                  {
+                      "name": "POD_NAMESPACE",
+                      "valueFrom": {
+                          "fieldRef": {
+                              "apiVersion": "v1",
+                              "fieldPath": "metadata.namespace"
+                          }
+                      }
+                   }
+                ],
+                "volumeMounts": [
+                    {
+                        "name": "datadir",
+                        "mountPath": "/cockroach/cockroach-data"
+                    }
+                ]
+            }
+        ]'
     spec:
       containers:
       - name: cockroachdb
@@ -93,27 +138,23 @@ spec:
           - |
             # The use of qualified `hostname -f` is crucial:
             # Other nodes aren't able to look up the unqualified hostname.
-            CRARGS=("start" "--logtostderr" "--insecure" "--host" "$(hostname -f)")
-            # TODO(tschottdorf): really want to use an init container to do
-            # the bootstrapping. The idea is that the container would know
-            # whether it's on the first node and could check whether there's
-            # already a data directory. If not, it would bootstrap the cluster.
-            # We will need some version of `cockroach init` back for this to
-            # work. For now, just do the same in a shell snippet.
-            # Of course this isn't without danger - if node0 loses its data,
-            # upon restarting it will simply bootstrap a new cluster and smack
-            # it into our existing cluster.
-            # There are likely ways out. For example, the init container could
-            # query the kubernetes API and see whether any other nodes are
-            # around, etc. Or, of course, the admin can pre-seed the lost
-            # volume somehow (and in that case we should provide a better way,
-            # for example a marker file).
+            CRARGS=("start" "--logtostderr" "--insecure" "--host" "$(hostname -f)" "--http-host" "0.0.0.0")
+            # We only want to initialize a new cluster (by omitting the join flag)
+            # if we're sure that we're the first node (i.e. index 0) and that
+            # there aren't any other nodes running as part of the cluster that
+            # this is supposed to be a part of (which indicates that a cluster
+            # already exists and we should make sure not to create a new one).
+            # It's fine to run without --join on a restart if there aren't any
+            # other nodes.
             if [ ! "$(hostname)" == "cockroachdb-0" ] || \
-               [ -e "/cockroach/cockroach-data/COCKROACHDB_VERSION" ]
+               [ -e "/cockroach/cockroach-data/cluster_exists_marker" ]
             then
-              CRARGS+=("--join" "cockroachdb")
+              # We don't join cockroachdb in order to avoid a node attempting
+              # to join itself, which currently doesn't work
+              # (https://github.com/cockroachdb/cockroach/issues/9625).
+              CRARGS+=("--join" "cockroachdb-public")
             fi
-            /cockroach/cockroach ${CRARGS[*]}
+            exec /cockroach/cockroach ${CRARGS[*]}
       # No pre-stop hook is required, a SIGTERM plus some time is all that's
       # needed for graceful shutdown of a node.
       terminationGracePeriodSeconds: 60

diff --git a/examples/cockroachdb/minikube.sh b/examples/cockroachdb/minikube.sh
@@ -35,7 +35,7 @@ kubectl delete petsets,pods,persistentvolumes,persistentvolumeclaims,services -l
 # claims here manually even though that sounds counter-intuitive. For details
 # see https://github.com/kubernetes/contrib/pull/1295#issuecomment-230180894.
 # Note that we make an extra volume here so you can manually test scale-up.
-for i in $(seq 0 5); do
+for i in $(seq 0 3); do
   cat <<EOF | kubectl create -f -
 kind: PersistentVolume
 apiVersion: v1