A little chaos engineering application for kubernetes resilience testing.
- Configured kubernetes cluster with an ingress controller deployed
- Ingress controller nodeports mapped to 443 and 80.
- Configured
~/.kube/config - Installed kubectl (
brew install kubectlor https://kubernetes.io/docs/tasks/tools/install-kubectl/) - Successful execution of
kubectl get nodes
Modify ./config/discovery.yaml to fit your needs. There are two major sections, nodes and ingresses. s
-
nodessection allows you to specify whether you want to periodically drain nodes, how often, and which nodes. These settings are underenabled,intervalandfileds+labels(selectors). Interval can be specified as10sor1h.enabledis atrueorfalse.labelscontains a list of filters based on labels,fieldshas a list of filters based on fields. Some examples can be found here: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ . It is a pretty powerful tool. -
ingressessection allows you to specify the ingress discovery process. You can specifyfieldsandlabelsselectors,enabledandintervalsettings like above, but there are three ingress specific settings.protocolallows you to specify a default protocol for non-host specific ingresses -- it is eitherhttporhttps. Those same ingresses need a default port and a host. In case an ingress route contains a host, we will use that instead. If an ingress has a reference intlspointing to such a host, we will assume it is https on port 443, otherwise, http on port 80.
Run the discovery by executing ./kube-entropy -mode discovery. It will create a test plan file. We capture a bunch of settings, including full ingress uris, http response codes and key http headers.
In this mode, applications are being stressed out based on the test plan, while we continuosly monitor ingress states. If http status changes, or a set of http headers changes (excluding some basic ones, like Content-Length or Set-Cookie). This indicates an application error or a default backend. Looking at the application logs allows you to determine the source of instability. You might as well can have external monitors enabled. Run this function by executing ./kube-entropy -mode chaos
---
nodes:
enabled: true
fields:
- spec.unschedulable!=true
labels:
interval: 5m
ingresses:
protocol: https
port: 443
defaultHost: www.avsatum.com
selector:
enabled: true
interval: 2s
fields:
- metadata.namespace=default
- metadata.namespace!=kube-system
- metadata.namespace!=docker
labels:
successHttpCodes:
- 2xx
- 30x
- 403It is designed to randomly stress two separate events: pod restarts and node drains. Two types of monitoring are supported: service monitoring and ingress monitoring. Each type of monitoring and stress action is independently controlled by labels, selectors, and timing interval.
Designed primarily to keep internal communications in check. If a monitored from within the cluster, service endpoints are invoked directly (only TCP checking is used). If monitoring from the outside of the cluster, node ports are checked against some nodePortHost, which is most likely a load balancer. NodePort as well as the service port information is obtained from service definitions. If you use a complex port mapping outside of kubernetes, try deploying kube-entropy into your cluster.
This type of monitoring is useful to determine if the application responds to ingress requests. As with all kubernetes ingresses, these are reverse proxy routes through the ingress controller (usually nginx), into service and pod IPs. When a pod gets deleted, its IP will be removed from the ingress controller configuration. If the ingress controller doesn't referesh its configuration, an ingress call can be potentially routed to a stale pod IP, which is what we're trying to avoid. Ingress monitoring is HTTP-based, a list of acceptable HTTP codes can be specified in the kube-entropy config file:
successHttpCodes:
- 2xx
- 3xx
- 401- DNS disruption
- Network connectivity disruption
- Support for Istio/Knative