diff --git a/README.md b/README.md index cfd692f..f53a8b5 100644 --- a/README.md +++ b/README.md @@ -20,9 +20,39 @@ enhanced visual navigation of the dynamic state of a Kubernetes cluster. ## How to install and activate this service -The Cluster Insight service expects to run on the Kubernetes master node, and -can be deployed from a self-contained Docker image, built offline from the -source code. +The Cluster Insight service is a self-contained Docker image. It runs in two +modes: in the "master" mode it collects data from Kubernetes and from the +Docker daemons running on the minion modes. In the "minion" mode it acts +as a proxy for the local Docker daemon. The Cluster-Insight "master" should +run on the Kubernetes master, whereas the Cluster-Insight "minion" should run +on every Kubernetes minion node in the cluster. +The provided installation script will configure, install, and run the +Cluster-Insight service on any Kubernetes cluster. +The Cluster-Insight service does not require any change to the Docker +daemon configuration or require restarting the Docker daemons. + +### Easy installation: run the `gcp-project-setup.sh` script + +The `gcp-project-setup.sh` script will configure, install, and run the +Cluster-Insight service +on the given Kubernetes cluster running on GCP. You should run the script +on your workstation. To run the script, follow the instruction below. +If you run the script, then you can skip the rest of this section. +* Clone the Cluster-Insight sources from Github into a local directory + `./cluster-insight` with the command + `git clone https://github.com/google/cluster-insight.git` . +* Run the installation script with the command + `cd ./cluster-insight/install; ./gcp-project-setup.sh PROJECT_ID`. + The script will fetch the latest version of the Cluster-Insight container + from Docker Hub. + The script will print "SCRIPT ALL DONE" if it completed the operation + sucessfully. +* Note: the installation script may take up to a minute per node in case + that the Cluster-Insight or its libraries are not pre-loaded on the + node. Also, creating the firewall rule may take up to a minute. + + +### Manual installation (for example, on AWS) You can either install a pre-built image or build a Docker image from the source code. @@ -37,21 +67,38 @@ To build a Docker image from the source code, follow these instructions: `git clone https://github.com/google/cluster-insight.git` . * Change directory to `./cluster-insight/collector` . * Run: `sudo docker build -t kubernetes/cluster-insight . ` -* Check for the image by the command: `sudo docker images`. + Note the terminating period which is a part of the command. +* Verify that the new image is available by the command: `sudo docker images`. You should see an image named `kubernetes/cluster-insight`. To install and activate this service, follow these instructions: -To set up the minion nodes, do the following: - * Login to the master host. - * Run cluster-insight in "minion" mode on all the minions so as to access the docker daemon on port 4243. We provide a script for a replication controller, so this is easy to set up. Run the following kubectl commands on the master host to set this up: +To set up the minion nodes, do the following on every minion node: + * Pull the Cluster-Insight binary from Docker Hub or compile it from + the sources in every minion node. To pull the pre-built binary image from + Docker Hub use the following command: + `sudo docker pull kubernetes/cluster-insight`. + * To compile the binary from the source code, follow the instruction + above for compiling the binary on the master node. + * *IMPORTANT:* You must have the same binary image of the Cluster-Insight + service + in each minion node prior to starting the Cluster-Insight service. + You can do it by either fetching the image from Docker Hub or compiling + it from the latest sources. See above. + +Perform the following on the master node to complete the set up of the minion nodes: + * Clone the Cluster-Insight sources from Github into a local directory + if you have not done so already. See above for the command. + * Run cluster-insight in "minion" mode on all the minions to enable access + to the docker daemon on port 4243. We provide a script for a replication + controller, so this is easy to set up. Run the following `kubectl` + commands on the master host to set this up: * `kubectl create -f cluster-insight/collector/cluster-insight-controller.json` * `kubectl resize rc cluster-insight-controller --replicas=` * Here `` is the number of minion nodes in your cluster. Since the container binds to a specific port on the host, this will ensure that exactly one cluster-insight container runs on each minion. - * NOTE: If building from source, please make sure that you build the source on each minion, otherwise the master could have a different image from that on the minions. * All the minion nodes will have a cluster-insight container node running on them, and will provide limited read access to their docker daemon via port 4243. Try this out by running the following command based on the internal-ip of one of your minions: `curl http://:4243/containers/json`. -To set up the master node, do the following: +To set up the Cluster-Insight service on master node, do the following: * Login to the master host. * Check if the Docker service is running: `sudo docker ps`. If this gives an error, you must install and start the Docker service on this machine @@ -66,14 +113,14 @@ To set up the master node, do the following: `sudo docker images`. * Start the Cluster-Insight service like this: - `sudo docker run -d --net=host -p 5555:5555 --name cluster-insight kubernetes/cluster-insight`. + `sudo docker run -d --net=host -p 5555:5555 --name cluster-insight -e CLUSTER_INSIGHT_MODE=master kubernetes/cluster-insight`. * The Cluster-Insight service should now be listening for REST API requests on port 5555 in the Kubernetes master. Check this by typing: `sudo docker ps` - you should see a running container with the name cluster-insight. * To start the Cluster-Insight service in debug mode, append the `-d` or `--debug` flags to the end of the command line like this: - `sudo docker run -d --net=host -p 5555:5555 --name cluster-insight kubernetes/cluster-insight python ./collector.py --debug`. + `sudo docker run -d --net=host -p 5555:5555 --name cluster-insight -e CLUSTER_INSIGHT_MODE=master kubernetes/cluster-insight python ./collector.py --debug`. Please excercise caution when enabling the debug mode, because it enables a significant security hole. Any user who triggers a failure in the Cluster-Insight service will have unrestricted access to the debugger @@ -90,11 +137,17 @@ To set up the master node, do the following: ## Data collection details -The Cluster Insight service runs on the Kubernetes master node, and accesses -the Docker daemons on all of the minion nodes via port 4243 on each minion node. -It also accesses the Docker daemon on the master node via port 4243. -In addition, it listens for external HTTP requests to its REST endpoint on port -5555 of the master node, as shown in the figure below: +The Cluster-Insight service runs in "master" mode on the Kubernetes master +node, and accesses the Docker daemons on all of the minion nodes via the +Cluster-Insight service which runs in "minion" mode and acts as a proxy to +the Docker daemons. +The "master" Cluster-Insight service listens for external HTTP requests to +its REST endpoint on port 5555 of the master node, as shown in the figure +below. +The "minion" Cluste-Insight service listens for requests on port 4243 and +relays them to the local Docker daemon via a Unix-domain socket. +Note that the Unix-domain socket is the default way of communicating with +the Docker daemon. ![alt text](cluster-insight-architecture.png "cluster-insight service setup") diff --git a/collector/Dockerfile b/collector/Dockerfile index 3fd6200..2b40efe 100644 --- a/collector/Dockerfile +++ b/collector/Dockerfile @@ -22,7 +22,7 @@ # You should login to the node (VM instance) of the Kubernetes master with the # commands: -# gcloud config set project PROJECT_NAME +# gcloud config set project PROJECT_ID # gcloud compute ssh MASTER_NODE_NAME # # You should then clone the Cluster-Insight GitHub repository to obtain the @@ -41,7 +41,7 @@ # To run a container from this image in dev/test mode as the master: # (Ctrl-C will stop and remove the container): -# sudo docker run --rm --net=host -p 5555:5555 --name cluster-insight kubernetes/cluster-insight python ./collector.py --debug +# sudo docker run --rm --net=host -p 5555:5555 --name cluster-insight -e CLUSTER_INSIGHT_MODE=master kubernetes/cluster-insight python ./collector.py --debug # # To run a container from this image in detached production mode as the master: # sudo docker run -d --net=host -p 5555:5555 --name cluster-insight -e CLUSTER_INSIGHT_MODE=master kubernetes/cluster-insight diff --git a/install/gcp-project-setup.sh b/install/gcp-project-setup.sh index c6498b9..d35801f 100755 --- a/install/gcp-project-setup.sh +++ b/install/gcp-project-setup.sh @@ -37,6 +37,7 @@ MINION_SCRIPT_NAME="./node-setup.sh" MASTER_SCRIPT_NAME="./master-setup.sh" +FIREWALL_RULE_NAME="cluster-insight-collector" if [ $# -ne 1 ]; then echo "SCRIPT FAILED" @@ -128,5 +129,34 @@ else exit 1 fi +firewall_rules_list="$(gcloud compute firewall-rules list --project=${PROJECT_ID} | fgrep ${FIREWALL_RULE_NAME})" +if [[ "${firewall_rules_list}" == "" ]]; then + echo "setup firewall rule" + gcloud compute firewall-rules create --project=${PROJECT_ID} ${FIREWALL_RULE_NAME} --allow tcp:5555 --network "default" --source-ranges "0.0.0.0/0" --target-tags ${master_instance_name} + if [[ $? -ne 0 ]]; then + echo "FAILED to create firewall rule" + exit 1 + else + echo "created firewall rule successfully" + fi +else + echo "firewall rule exists" +fi + +echo "checking Cluster-Insight master health" +master_instance_IP_address="$(gcloud compute --project="${PROJECT_ID}" instances list | fgrep ${master_instance_name} | awk '{print $5}')" +if [[ "${master_instance_IP_address}" == "" ]]; then + echo "FAILED to find master instance ${master_instance_name} IP address" + exit 1 +fi + +health=$(curl http://${master_instance_IP_address}:5555/healthz 2> /dev/null) +if [[ "${health}" =~ "OK" ]]; then + echo "master is alive" +else + echo "FAILED to get master health response" + echo 1 +fi + echo "SCRIPT ALL DONE" exit 0