A production-ready, lightweight Kubernetes Chaos Engineering operator built with Kubebuilder v4. Test your application's resilience through controlled chaos injection with comprehensive safety features.
- π‘οΈ Safety First: Dry-run mode, percentage limits, exclusion labels, production protection
- π― 6 Chaos Actions: Pod kill, delay, CPU/memory stress, failure, node drain
- β° Smart Scheduling: Cron-based recurring experiments with duration control
- π Full Observability: Prometheus metrics, Grafana dashboards, audit history
- π Automatic Retry: Configurable backoff strategies for transient failures
- π Comprehensive Docs: Getting started guide, best practices, real-world scenarios
- π§ͺ Hands-on Labs: Interactive learning environment with automated setup
Pod Chaos
- β pod-kill: Delete pods to test deployment resilience
- β pod-delay: Inject network latency (50ms-5s)
- β pod-cpu-stress: Consume CPU resources (1-100%)
- β pod-memory-stress: Consume memory resources
- β pod-failure: Kill main process to test restart behavior
Node Chaos
- β node-drain: Drain nodes with automatic uncordon
- β Dry-Run Mode: Preview affected resources without execution
- β Max Percentage Limits: Prevent affecting too many resources (e.g., max 30%)
- β Production Protection: Explicit approval required for production namespaces
- β Exclusion Labels: Protect critical pods/namespaces
- β Experiment Duration: Auto-stop after specified time
- β
Cron Scheduling: Recurring experiments (
*/30 * * * *) - β Retry Logic: Exponential or fixed backoff strategies
- β Prometheus Metrics: Experiments, duration, resources affected, errors, safety metrics
- β Grafana Dashboards: 3 comprehensive dashboards (overview, detailed, safety)
- β Experiment History: Full audit trail with configurable retention
- β Safety Metrics: Track dry-runs, production blocks, percentage violations
- β CLI Tool: Rich commands for listing, describing, stats, and top experiments
- β Comprehensive Docs: Getting Started, Best Practices, Troubleshooting, Scenarios
- β Hands-on Labs: Step-by-step tutorials with automated cluster setup
- β Validation: Multi-layer validation (OpenAPI + admission webhooks)
New to k8s-chaos? Follow our Getting Started Guide for a complete tutorial.
# 1. Create a local cluster (optional)
make cluster-single-node
# 2. Install k8s-chaos with Helm
helm install k8s-chaos charts/k8s-chaos -n k8s-chaos-system --create-namespace
# 3. Try Lab 01
cd labs/01-getting-started
make setup
kubectl apply -f experiments/01-simple-pod-kill.yaml- Kubernetes cluster (1.24+)
- kubectl configured to access your cluster
- Go 1.24.5+ (for development)
- Docker (for building images)
- Kind or Minikube (for local testing)
The easiest way to install k8s-chaos is using Helm:
# Install from local chart
helm install k8s-chaos charts/k8s-chaos \
--namespace k8s-chaos-system \
--create-namespace
# Verify installation
kubectl get pods -n k8s-chaos-systemCustom Configuration:
# Development setup
helm install k8s-chaos charts/k8s-chaos \
-n k8s-chaos-system --create-namespace \
--set controller.logLevel=debug \
--set history.retentionLimit=50
# Production setup with cert-manager
helm install k8s-chaos charts/k8s-chaos \
-n k8s-chaos-system --create-namespace \
--set webhook.certificate.certManager=true \
--set metrics.serviceMonitor.enabled=trueSee Helm Chart Documentation for all configuration options.
If you prefer to install manually:
# Install CRDs
make install
# Deploy controller
make deploy IMG=ghcr.io/neogan74/k8s-chaos:latestk8s-chaos includes a powerful command-line tool for managing and monitoring chaos experiments:
# Build and install the CLI
make build-cli
make install-cli
# List all experiments
k8s-chaos list
# View experiment details
k8s-chaos describe nginx-chaos-demo -n chaos-testing
# Show statistics
k8s-chaos stats
# Show top experiments by metrics
k8s-chaos topSee the CLI documentation for complete usage details.
apiVersion: chaos.gushchin.dev/v1alpha1
kind: ChaosExperiment
metadata:
name: nginx-chaos
namespace: default
spec:
action: pod-kill # Action to perform
namespace: production # Target namespace
selector: # Label selector for targets
app: nginx
count: 2 # Number of pods to affect (default: 1)Apply the experiment:
kubectl apply -f config/samples/chaos_v1alpha1_chaosexperiment.yaml# List experiments
kubectl get chaosexperiments
# Get detailed status
kubectl describe chaosexperiment nginx-chaos
# Watch status updates
kubectl get chaosexperiment nginx-chaos -wkubectl delete chaosexperiment nginx-chaos.
βββ api/v1alpha1/ # API types and CRD definitions
βββ internal/controller/ # Reconciliation logic
βββ config/ # Kustomize deployment manifests
βββ cmd/main.go # Controller entrypoint
βββ hack/ # Build scripts and tools
# Clone repository
git clone https://github.com/neogan74/k8s-chaos.git
cd k8s-chaos
# Install dependencies
go mod download
# Generate code after API changes
make generate manifests
# Run locally against cluster
make run
# Run tests
make test
# Run linter
make lint# Unit tests with coverage
make test
# E2E tests (creates Kind cluster)
make test-e2e
# Specific test package
go test ./internal/controller/... -v# Build binary
make build
# Build container image
make docker-build IMG=myrepo/k8s-chaos:tag
# Push to registry
make docker-push IMG=myrepo/k8s-chaos:tag| Field | Type | Description | Required | Default |
|---|---|---|---|---|
action |
string | Chaos action to perform (pod-kill, pod-delay, node-drain) |
Yes | - |
namespace |
string | Target namespace for experiments | Yes | - |
selector |
map[string]string | Label selector for target resources | Yes | - |
count |
int | Number of resources to affect (1-100) | No | 1 |
duration |
string | Duration for time-based actions (e.g., "30s", "5m") | No | - |
| Field | Type | Description |
|---|---|---|
lastRunTime |
Time | Timestamp of last execution |
message |
string | Human-readable status message |
phase |
string | Current phase (Pending, Running, Completed, Failed) |
- RBAC: The controller requires specific permissions to manage pods and other resources
- Namespace Isolation: Experiments are namespace-scoped by design
- Validation: All inputs are validated to prevent malicious configurations
- Audit: All chaos actions are logged for audit purposes
We welcome contributions! Please see our Contributing Guide for detailed information on:
- Code of Conduct: Standards for community interaction
- Development Setup: Setting up your environment
- Contribution Process: How to submit changes
- Code Standards: Coding conventions and best practices
- Testing Requirements: Writing and running tests
- Documentation Guidelines: Updating documentation
# 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/k8s-chaos.git
cd k8s-chaos
# 2. Set up development environment
make dev-setup
# 3. Create a branch
git checkout -b feature/your-feature
# 4. Make changes, test, and commit
make test lint
git commit -m "feat: your feature description"
# 5. Push and create PR
git push origin feature/your-featureSee CONTRIBUTING.md for complete guidelines.
- Quick Start - Get running in 5 minutes with video demo guides
- Installation Guide - Complete installation for all environments
- Getting Started Tutorial - First experiment walkthrough
- Hands-on Labs - Interactive learning tutorials
- Best Practices - Safety-first principles and progressive adoption
- Real-World Scenarios - 13 ready-to-use examples
- Troubleshooting - Common issues and solutions
- CLI Tool - Command-line interface documentation
- Architecture Overview - System design and components
- API Reference - Complete CRD specification
- Metrics Guide - Prometheus metrics and monitoring
- Grafana Dashboards - Dashboard setup and usage
- Experiment History - Audit logging and history tracking
- Contributing Guide - How to contribute to k8s-chaos
- Development Guide - Local development setup
- Roadmap - Future development plans
| Feature | k8s-chaos | Chaos Mesh | Litmus Chaos |
|---|---|---|---|
| Lightweight | β | β | β |
| Simple CRDs | β | β | β |
| Pod Chaos | β | β | β |
| Node Chaos | β | β | β |
| Network Chaos | π§ Planned | β | β |
| Scheduling | β Cron | β | β |
| Safety Features | β Comprehensive | β | β |
| Metrics & Dashboards | β | β | β |
| Audit History | β | β | β |
| UI Dashboard | π§ Planned | β | β |
| Learning Curve | Easy | Moderate | Moderate |
k8s-chaos excels at being lightweight, simple to deploy, and production-ready with comprehensive safety features while maintaining an easy learning curve.
Copyright 2025.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- Built with Kubebuilder
- Inspired by Chaos Mesh and Litmus Chaos
- Thanks to the Kubernetes SIG API Machinery community