Automated FinOps Enforcer

Active Cost Governance for Kubernetes

What This Is

A production-grade Kubernetes controller that automatically detects and pauses idle workloads in non-production environments, delivering measurable cost savings without human intervention.

Core Principle: Dashboards don't save money. Actions do.

The Problem

Cloud cost overruns are rarely caused by malice. They're caused by:

Dev and staging environments running 24×7
Forgotten experimental deployments
Accumulated non-prod workloads
Zero accountability for idle infrastructure

By the time your Cost Explorer shows the damage, the money is already gone.

The Solution

This controller:

Detects idle Kubernetes workloads with high confidence
Acts automatically but conservatively (scale-to-zero)
Explains every action it takes
Reverses easily via Slack interaction
Measures savings in real-time

Architecture

┌──────────────┐
│ Kubernetes   │
│ Cluster      │
└──────┬───────┘
       │
       ▼
┌──────────────────┐
│ OpenCost         │
│ (Cost Metrics)   │
└──────┬───────────┘
       │
       ▼
┌────────────────────────┐
│ FinOps Enforcer        │
│ (Controller)           │
│                        │
│ - Fetch cost data      │
│ - Evaluate policies    │
│ - Safe enforcement     │
└──────┬─────────────────┘
       │
       ▼
┌──────────────────────┐
│ Actions              │
│ - Scale to zero      │
│ - Annotate resource  │
│ - Notify team        │
└──────────────────────┘

Key Features

Policy-Driven Governance

Define what "idle" means for your organization:

policyName: non-prod-idle-gc
scope:
  namespaces:
    include:
      - dev-*
      - staging-*
    exclude:
      - prod

conditions:
  idleWindow: 48h
  minHourlyCost: 2.0
  trafficThreshold:
    requestsPerMinute: 0

actions:
  type: scaleToZero
  notify: slack
  reactivationAllowed: true

Human-in-the-Loop

Every enforcement action triggers a Slack notification with one-click reactivation:

🚨 Idle Resource Paused

Namespace: dev-payments
Deployment: invoice-worker
Idle Duration: 72 hours
Estimated Monthly Savings: $180

⏯️ Reactivate | 📄 View Details

Safety Guardrails

Namespace allowlisting - Production is never touched by default
Cooldown windows - Prevents flapping
Bounded actions - Max resources per run
Dry-run mode - Test policies safely
Audit trail - Every action is logged

Real-Time Metrics

finops_paused_resources_total - Resources currently paused
finops_estimated_savings_usd - Projected monthly savings
finops_policy_matches_total - Policy evaluation results
finops_actions_taken_total - Enforcement actions by type

What This Is NOT

This project intentionally does not:

❌ Use ML for cost forecasting
❌ Replace your billing system
❌ Enforce globally across all namespaces
❌ Delete resources permanently
❌ Touch production by default

Philosophy: Conservative automation that preserves trust.

Quick Start

Prerequisites

Kubernetes cluster (1.24+)
OpenCost installed
Slack webhook (optional)
Prometheus (for metrics)

Installation

# Install via Helm
helm repo add finops-enforcer https://charts.finops-enforcer.io
helm install finops-enforcer finops-enforcer/finops-enforcer \
  --namespace finops-system \
  --create-namespace \
  --set opencost.endpoint=http://opencost.opencost:9003 \
  --set slack.webhookURL=<your-webhook>

# Or via kubectl
kubectl apply -f https://raw.githubusercontent.com/yourusername/finops-enforcer/main/deploy/install.yaml

Configuration

Create a policy file:

cat <<EOF | kubectl apply -f -
apiVersion: finops.io/v1alpha1
kind: EnforcementPolicy
metadata:
  name: dev-idle-gc
  namespace: finops-system
spec:
  scope:
    namespaces:
      include:
        - dev-*
        - staging-*
  conditions:
    idleWindow: 48h
    minHourlyCost: 2.0
  actions:
    type: scaleToZero
    notify: slack
EOF

Monitor enforcement:

kubectl logs -n finops-system deployment/finops-enforcer -f

Use Cases

Scenario 1: Forgotten Dev Environment

A developer spins up a feature branch environment for testing. After the feature merges, the environment is forgotten. After 48 hours of zero traffic, the enforcer:

Detects idle state
Scales deployments to zero
Notifies team in Slack
Saves ~$120/month

Scenario 2: Weekend Non-Prod

Staging environments run 24×7 but only used Monday-Friday. The enforcer:

Detects weekend idle patterns
Auto-pauses Friday evening
Team reactivates Monday morning
Saves ~$400/month

Scenario 3: Load Test Cleanup

After load testing, high-resource deployments are left running. The enforcer:

Detects abnormal cost + zero traffic
Flags for review
Auto-pauses after confirmation window
Saves ~$800/month

Project Structure

.
├── cmd/
│   ├── controller/          # Main controller binary
│   └── cli/                 # finops-ctl CLI tool
├── pkg/
│   ├── controller/          # Reconciliation logic
│   ├── policy/              # Policy engine
│   ├── cost/                # OpenCost integration
│   ├── enforcement/         # Action execution
│   ├── metrics/             # Prometheus metrics
│   └── notifications/       # Slack integration
├── api/
│   └── v1alpha1/            # CRD definitions
├── config/
│   ├── crd/                 # Custom Resource Definitions
│   ├── rbac/                # RBAC manifests
│   ├── manager/             # Controller deployment
│   └── samples/             # Example policies
├── deploy/
│   ├── helm/                # Helm chart
│   └── manifests/           # Raw Kubernetes YAML
├── test/
│   ├── e2e/                 # End-to-end tests
│   └── integration/         # Integration tests
└── docs/
    ├── DESIGN.md            # Architecture deep-dive
    ├── POLICIES.md          # Policy reference
    └── RUNBOOK.md           # Operational guide

Development

Prerequisites

Go 1.21+
Docker
kubectl
Kind or Minikube (for local testing)

Local Development

# Clone repository
git clone https://github.com/yourusername/finops-enforcer.git
cd finops-enforcer

# Install dependencies
go mod download

# Run tests
make test

# Build
make build

# Run locally (against current kubeconfig context)
make run

# Build Docker image
make docker-build IMG=finops-enforcer:dev

Testing

# Unit tests
make test

# Integration tests (requires kind cluster)
make test-integration

# E2E tests
make test-e2e

# Coverage report
make coverage

Configuration Reference

Controller Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: finops-enforcer-config
  namespace: finops-system
data:
  config.yaml: |
    opencost:
      endpoint: http://opencost.opencost:9003
      timeout: 30s
    
    enforcement:
      dryRun: false
      maxActionsPerRun: 10
      cooldownWindow: 1h
    
    notifications:
      slack:
        enabled: true
        webhookURL: ${SLACK_WEBHOOK_URL}
        channel: "#finops-alerts"
    
    metrics:
      enabled: true
      port: 8080
      path: /metrics

Policy Specification

See POLICIES.md for complete reference.

Metrics & Monitoring

Key Metrics

Metric	Type	Description
`finops_paused_resources_total`	Gauge	Currently paused resources
`finops_estimated_savings_usd`	Gauge	Projected monthly savings
`finops_policy_matches_total`	Counter	Policy evaluation matches
`finops_actions_taken_total`	Counter	Enforcement actions by type
`finops_reactivations_total`	Counter	User-initiated reactivations
`finops_false_positives_total`	Counter	Reverted within 1 hour

Grafana Dashboard

Import the provided dashboard from deploy/grafana/dashboard.json:

Real-time savings projection
Top idle namespaces
Actions taken vs reverted
Policy effectiveness

Security

RBAC Permissions

The controller requires minimal permissions:

Read: pods, deployments, services (for cost correlation)
Write: deployments (scale only), annotations
No: delete permissions

See config/rbac/ for complete RBAC definitions.

Network Policies

Restrict controller traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: finops-enforcer
spec:
  podSelector:
    matchLabels:
      app: finops-enforcer
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: opencost
      ports:
        - protocol: TCP
          port: 9003

Troubleshooting

Common Issues

Issue: No resources being paused

Check policy scope matches target namespaces
Verify OpenCost is returning cost data
Review dry-run mode setting

Issue: False positives

Increase idleWindow duration
Add namespace exclusions
Adjust traffic thresholds

Issue: Metrics not appearing

Verify Prometheus ServiceMonitor
Check controller logs for errors
Confirm metrics port accessibility

See RUNBOOK.md for detailed troubleshooting.

Roadmap

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

License

MIT License - see LICENSE for details.

Author

Built and maintained by Rajesh Ramesh

GitHub: @rramesh17993
Portfolio: Production-grade Kubernetes controllers and cloud infrastructure automation

Acknowledgments

OpenCost for real-time Kubernetes cost metrics
CNCF for fostering cloud-native cost management practices

Built with restraint, shipped with confidence.

This is practical infrastructure automation that respects the humans who have to live with it.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
api/v1alpha1		api/v1alpha1
cmd/controller		cmd/controller
config		config
deploy		deploy
docs		docs
pkg		pkg
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT_OVERVIEW.md		PROJECT_OVERVIEW.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum

License

rramesh17993/Automated-FinOps-Enforcer

Folders and files

Latest commit

History

Repository files navigation

Automated FinOps Enforcer

What This Is

The Problem

The Solution

Architecture

Key Features

Policy-Driven Governance

Human-in-the-Loop

Safety Guardrails

Real-Time Metrics

What This Is NOT

Quick Start

Prerequisites

Installation

Configuration

Use Cases

Scenario 1: Forgotten Dev Environment

Scenario 2: Weekend Non-Prod

Scenario 3: Load Test Cleanup

Project Structure

Development

Prerequisites

Local Development

Testing

Configuration Reference

Controller Configuration

Policy Specification

Metrics & Monitoring

Key Metrics

Grafana Dashboard

Security

RBAC Permissions

Network Policies

Troubleshooting

Common Issues

Roadmap

Contributing

License

Author

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages