Practice Kubernetes troubleshooting with realistic error scenarios.
Each scenario is run with kubectl apply
commands. To cleanup, run kubectl delete
on the same.
Crashing Pod (CrashLoopBackoff)
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml
To get notifications like below, install Robusta:
OOMKilled Pod (Out of Memory Kill)
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/oomkill/oomkill_job.yaml
To get notifications like below, install Robusta:
High CPU Throttling (CPUThrottlingHigh)
Apply the following YAML and wait 15 minutes. (CPU throttling is only an issue if it occurs for a meaningful period of time. Less than 15 minutes of throttling typically does not trigger an alert.)
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/cpu_throttling/throttling.yaml
To get notifications like below, install Robusta:
Pending Pod (Unschedulable due to Node Selectors)
Apply the following YAML and wait 15 minutes. (By default, most systems only alert after pods are pending for 15 minutes. This prevents false alarms on autoscaled clusters, where it's OK for pods to be temporarily pending.)
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/pending_pods/pending_pod_node_selector.yaml
To get notifications like below, install Robusta:
ImagePullBackOff
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/image_pull_backoff/no_such_image.yaml
To get notifications like below, install Robusta:
Liveness Probe Failure
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml
To get notifications like below, install Robusta:
Readiness Probe Failure
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/readiness_probe_fail/failing_readiness_probe.yaml
Job Failure
The job will fail after 60 seconds, then attempt to run again. After two attempts, it will fail for good.kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/job_failure/job_crash.yaml
To get notifications like below, install Robusta:
Failed Helm Releases
Deliberately deploy a failing Helm release:helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update
helm install kubewatch robusta/kubewatch --set='rbac.create=true,updateStrategy.type=Error' --namespace demo-namespace
Upgrade the release so it succeeds:
helm upgrade kubewatch robusta/kubewatch --set='rbac.create=true' --namespace demo-namespace --create-namespace
Clean up by removing the release and deleting the namespace:
helm del kubewatch --namespace demo-namespace
kubectl delete namespace demo-namespace
To get notifications like below, install Robusta and setup Helm Releases Monitoring
Correlate Changes and Errors
Deploy a healthy pod. Then break it.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/healthy.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml
If someone else made this change, would you be able to immediately pinpoint the change that broke the application?
To get notifications like below, install Robusta.
Track Deployment Changes
Create an nginx deployment. Then simulate multiple unexpected changes to this deployment.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/deployment_image_change/before_image_change.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/deployment_image_change/after_image_change.yaml
To get notifications like below, install Robusta and setup Kubernetes change tracking
Track Ingress Changes
Create an ingress. Then changes its path and secretName to simulate an unexpected ingress modification.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/ingress_port_path_change/before_port_path_change.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/ingress_port_path_change/after_port_path_change.yaml
To get notifications like below, install Robusta and setup Kubernetes change tracking
Drift Detection and Namespace Diff
Deploy two variants of the same application in different namespaces:
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/namespace_drift/example.yaml
Can you quickly tell the difference between the compare1
and compare2
namespaces? What is the drift between them?
To do so with Robusta, install Robusta and enable the UI.
Inefficient GKE Nodes
On GKE, nodes can reserve more than 50% of CPU for themselves. Users pay for CPU that is unavailable to applications.
Reproduction:
- Create a default GKE cluster with autopilot disabled. Don't change any other settings.
- Deploy the following pod:
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/gke_node_allocatable/gke_issue.yaml
- Run
kubectl get pods -o wide gke-node-allocatable-issue
The pod will be Pending. A Pod requesting 1 CPU cannot run on an empty node with 2 CPUs!
To see problems like this with Robusta, install Robusta and enable the UI.