Active Cost Governance for Kubernetes
A production-grade Kubernetes controller that automatically detects and pauses idle workloads in non-production environments, delivering measurable cost savings without human intervention.
Core Principle: Dashboards don't save money. Actions do.
Cloud cost overruns are rarely caused by malice. They're caused by:
- Dev and staging environments running 24×7
- Forgotten experimental deployments
- Accumulated non-prod workloads
- Zero accountability for idle infrastructure
By the time your Cost Explorer shows the damage, the money is already gone.
This controller:
- Detects idle Kubernetes workloads with high confidence
- Acts automatically but conservatively (scale-to-zero)
- Explains every action it takes
- Reverses easily via Slack interaction
- Measures savings in real-time
┌──────────────┐
│ Kubernetes │
│ Cluster │
└──────┬───────┘
│
▼
┌──────────────────┐
│ OpenCost │
│ (Cost Metrics) │
└──────┬───────────┘
│
▼
┌────────────────────────┐
│ FinOps Enforcer │
│ (Controller) │
│ │
│ - Fetch cost data │
│ - Evaluate policies │
│ - Safe enforcement │
└──────┬─────────────────┘
│
▼
┌──────────────────────┐
│ Actions │
│ - Scale to zero │
│ - Annotate resource │
│ - Notify team │
└──────────────────────┘
Define what "idle" means for your organization:
policyName: non-prod-idle-gc
scope:
namespaces:
include:
- dev-*
- staging-*
exclude:
- prod
conditions:
idleWindow: 48h
minHourlyCost: 2.0
trafficThreshold:
requestsPerMinute: 0
actions:
type: scaleToZero
notify: slack
reactivationAllowed: trueEvery enforcement action triggers a Slack notification with one-click reactivation:
🚨 Idle Resource Paused
Namespace: dev-payments
Deployment: invoice-worker
Idle Duration: 72 hours
Estimated Monthly Savings: $180
⏯️ Reactivate | 📄 View Details
- Namespace allowlisting - Production is never touched by default
- Cooldown windows - Prevents flapping
- Bounded actions - Max resources per run
- Dry-run mode - Test policies safely
- Audit trail - Every action is logged
finops_paused_resources_total- Resources currently pausedfinops_estimated_savings_usd- Projected monthly savingsfinops_policy_matches_total- Policy evaluation resultsfinops_actions_taken_total- Enforcement actions by type
This project intentionally does not:
- ❌ Use ML for cost forecasting
- ❌ Replace your billing system
- ❌ Enforce globally across all namespaces
- ❌ Delete resources permanently
- ❌ Touch production by default
Philosophy: Conservative automation that preserves trust.
- Kubernetes cluster (1.24+)
- OpenCost installed
- Slack webhook (optional)
- Prometheus (for metrics)
# Install via Helm
helm repo add finops-enforcer https://charts.finops-enforcer.io
helm install finops-enforcer finops-enforcer/finops-enforcer \
--namespace finops-system \
--create-namespace \
--set opencost.endpoint=http://opencost.opencost:9003 \
--set slack.webhookURL=<your-webhook>
# Or via kubectl
kubectl apply -f https://raw.githubusercontent.com/yourusername/finops-enforcer/main/deploy/install.yaml- Create a policy file:
cat <<EOF | kubectl apply -f -
apiVersion: finops.io/v1alpha1
kind: EnforcementPolicy
metadata:
name: dev-idle-gc
namespace: finops-system
spec:
scope:
namespaces:
include:
- dev-*
- staging-*
conditions:
idleWindow: 48h
minHourlyCost: 2.0
actions:
type: scaleToZero
notify: slack
EOF- Monitor enforcement:
kubectl logs -n finops-system deployment/finops-enforcer -fA developer spins up a feature branch environment for testing. After the feature merges, the environment is forgotten. After 48 hours of zero traffic, the enforcer:
- Detects idle state
- Scales deployments to zero
- Notifies team in Slack
- Saves ~$120/month
Staging environments run 24×7 but only used Monday-Friday. The enforcer:
- Detects weekend idle patterns
- Auto-pauses Friday evening
- Team reactivates Monday morning
- Saves ~$400/month
After load testing, high-resource deployments are left running. The enforcer:
- Detects abnormal cost + zero traffic
- Flags for review
- Auto-pauses after confirmation window
- Saves ~$800/month
.
├── cmd/
│ ├── controller/ # Main controller binary
│ └── cli/ # finops-ctl CLI tool
├── pkg/
│ ├── controller/ # Reconciliation logic
│ ├── policy/ # Policy engine
│ ├── cost/ # OpenCost integration
│ ├── enforcement/ # Action execution
│ ├── metrics/ # Prometheus metrics
│ └── notifications/ # Slack integration
├── api/
│ └── v1alpha1/ # CRD definitions
├── config/
│ ├── crd/ # Custom Resource Definitions
│ ├── rbac/ # RBAC manifests
│ ├── manager/ # Controller deployment
│ └── samples/ # Example policies
├── deploy/
│ ├── helm/ # Helm chart
│ └── manifests/ # Raw Kubernetes YAML
├── test/
│ ├── e2e/ # End-to-end tests
│ └── integration/ # Integration tests
└── docs/
├── DESIGN.md # Architecture deep-dive
├── POLICIES.md # Policy reference
└── RUNBOOK.md # Operational guide
- Go 1.21+
- Docker
- kubectl
- Kind or Minikube (for local testing)
# Clone repository
git clone https://github.com/yourusername/finops-enforcer.git
cd finops-enforcer
# Install dependencies
go mod download
# Run tests
make test
# Build
make build
# Run locally (against current kubeconfig context)
make run
# Build Docker image
make docker-build IMG=finops-enforcer:dev# Unit tests
make test
# Integration tests (requires kind cluster)
make test-integration
# E2E tests
make test-e2e
# Coverage report
make coverageapiVersion: v1
kind: ConfigMap
metadata:
name: finops-enforcer-config
namespace: finops-system
data:
config.yaml: |
opencost:
endpoint: http://opencost.opencost:9003
timeout: 30s
enforcement:
dryRun: false
maxActionsPerRun: 10
cooldownWindow: 1h
notifications:
slack:
enabled: true
webhookURL: ${SLACK_WEBHOOK_URL}
channel: "#finops-alerts"
metrics:
enabled: true
port: 8080
path: /metricsSee POLICIES.md for complete reference.
| Metric | Type | Description |
|---|---|---|
finops_paused_resources_total |
Gauge | Currently paused resources |
finops_estimated_savings_usd |
Gauge | Projected monthly savings |
finops_policy_matches_total |
Counter | Policy evaluation matches |
finops_actions_taken_total |
Counter | Enforcement actions by type |
finops_reactivations_total |
Counter | User-initiated reactivations |
finops_false_positives_total |
Counter | Reverted within 1 hour |
Import the provided dashboard from deploy/grafana/dashboard.json:
- Real-time savings projection
- Top idle namespaces
- Actions taken vs reverted
- Policy effectiveness
The controller requires minimal permissions:
- Read: pods, deployments, services (for cost correlation)
- Write: deployments (scale only), annotations
- No: delete permissions
See config/rbac/ for complete RBAC definitions.
Restrict controller traffic:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: finops-enforcer
spec:
podSelector:
matchLabels:
app: finops-enforcer
egress:
- to:
- namespaceSelector:
matchLabels:
name: opencost
ports:
- protocol: TCP
port: 9003Issue: No resources being paused
- Check policy scope matches target namespaces
- Verify OpenCost is returning cost data
- Review dry-run mode setting
Issue: False positives
- Increase
idleWindowduration - Add namespace exclusions
- Adjust traffic thresholds
Issue: Metrics not appearing
- Verify Prometheus ServiceMonitor
- Check controller logs for errors
- Confirm metrics port accessibility
See RUNBOOK.md for detailed troubleshooting.
- Azure Cost Management integration
- AWS Cost Explorer integration
- Multi-cluster support
- Advanced scheduling policies
- Cost anomaly detection
- Self-service policy management UI
Contributions welcome! Please read CONTRIBUTING.md first.
MIT License - see LICENSE for details.
Built and maintained by Rajesh Ramesh
- GitHub: @rramesh17993
- Portfolio: Production-grade Kubernetes controllers and cloud infrastructure automation
- OpenCost for real-time Kubernetes cost metrics
- CNCF for fostering cloud-native cost management practices
Built with restraint, shipped with confidence.
This is practical infrastructure automation that respects the humans who have to live with it.