Skip to content

immanas/ClusterForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ ClusterForge β€” Multi-Environment GitOps Platform for Kubernetes:

A distributed multi-cluster Kubernetes system designed to manage multiple environments, where infrastructure provisioning, application deployment, scaling, and monitoring are fully automated using GitOps workflows, with built-in scalability and observability powered by Terraform, AWS EKS, and ArgoCD.

This project demonstrates how real Dev, Prod, and Control environments can be managed declaratively and reproducibly.

🧩 Problem vs Solution (Real-World Production Context) :

🚨 Real-World Problem ❌ What Typically Happens in Teams βœ… ClusterForge Solution
🌍 Dev works, Prod breaks Dev works, Prod breaks Manual configuration differences between clusters. Terraform modules create identical, reproducible clusters.
πŸ” β€œWho changed this?” incidents kubectl apply manually Cluster state diverges from Git. ArgoCD enforces Git as the single source of truth.
⏳ Traffic drops during deployment Pods are terminated before new ones are ready; users see downtime Rolling update strategy with readiness & liveness probes
πŸ“‰ Application crashes during traffic spike Static replica count; no autoscaling; manual intervention required HPA dynamically adjusts replicas based on CPU metrics
πŸ” Incident debugging takes hours Teams check only kubectl logs; no metrics visibility Prometheus monitoring stack provides real-time metrics and observability
πŸ—οΈ No one knows how infra was created Click-ops in AWS console; no documentation; hard to recreate environments Fully declarative Infrastructure as Code (Terraform)
πŸ” Over-permissioned IAM roles Static credentials and broad policies increase security risk IAM roles with least privilege + OIDC provider integration
πŸ’₯ Terraform destroy fails midway AWS resources have hidden dependencies (e.g., NAT β†’ subnets β†’ VPC); incorrect deletion order causes failures and manual cleanup Dependencies are explicitly handled and validated, ensuring clean and complete teardown of infrastructure
🌐 Flat networking causes exposure All services share same subnet; poor isolation between workloads Multi-AZ VPC with public/private subnet isolation
πŸ“¦ Dev accidentally affects Prod Single cluster used for multiple environments Dedicated EKS clusters per environment
πŸ“Š Monitoring added after outage Metrics and alerting introduced only after a production incident Monitoring integrated as a core platform layer
πŸ”„ Cluster management chaos Multiple clusters manually accessed and configured Central control cluster managing environments via GitOps

πŸ“‚ Project Structure:

The following represents the folder structure of the ClusterForge Infrastructure Repository, responsible for provisioning networking, security, and multi-environment Kubernetes clusters using Terraform.

clusterforge-infra/
β”‚
β”œβ”€β”€ modules/  # Reusable Terraform modules
β”‚
β”‚   β”œβ”€β”€ vpc/  # VPC, subnets, routing, NAT, gateways
β”‚   β”‚   β”œβ”€β”€ main.tf        # Defines networking resources
β”‚   β”‚   β”œβ”€β”€ variables.tf   # CIDR, AZs, subnet configs
β”‚   β”‚   └── outputs.tf     # VPC ID, subnet IDs
β”‚
β”‚   β”œβ”€β”€ eks/  # EKS cluster + node groups
β”‚   β”‚   β”œβ”€β”€ main.tf        # EKS, node groups, IRSA
β”‚   β”‚   β”œβ”€β”€ variables.tf   # Cluster config inputs
β”‚   β”‚   └── outputs.tf     # Endpoint, OIDC, node details
β”‚
β”‚   └── iam/  # IAM roles and policies
β”‚       β”œβ”€β”€ main.tf        # Roles, policies, OIDC trust
β”‚       β”œβ”€β”€ variables.tf   # Role configs
β”‚       └── outputs.tf     # Role ARNs
β”‚
β”œβ”€β”€ main.tf         # Root module wiring VPC, IAM, EKS
β”œβ”€β”€ variables.tf    # Global configuration variables
β”œβ”€β”€ outputs.tf      # Exported infrastructure outputs
β”œβ”€β”€ providers.tf    # AWS provider configuration
β”œβ”€β”€ backend.tf      # Remote state (S3 + DynamoDB)
β”œβ”€β”€ README.md       # Project documentation
β”œβ”€β”€ LICENSE         # License file
└── .gitignore      # Ignore local/terraform files

πŸ”— How Modules Work Together

  • VPC β†’ provides networking
  • IAM β†’ provides permissions
  • EKS β†’ uses both to create clusters

πŸ‘‰ Modules are separate but connected through inputs/outputs.

The application deployment layer of this project β€” including Kubernetes manifests, ArgoCD configuration, and the multi-environment GitOps workflow β€” is maintained in a separate repository.

πŸ”— ClusterForge GitOps Repository:
https://github.com/immanas/clusterforge-gitops

πŸ“‚ Repository Separation (Why two repos?)

  • clusterforge-infra β†’ builds infrastructure
  • clusterforge-gitops β†’ deploys applications

πŸ‘‰ Infra creates the platform, GitOps manages what runs on it.

πŸ—οΈ Architecture Diagram:

ArgoCD Sync

πŸ“ˆ Core Features:

βœ… What This Project IS ❌ What This Project is NOT
Multi-Environment Kubernetes Platform β€” Dev, Prod, and Control clusters running on Amazon EKS Not a single-cluster Kubernetes demo
Infrastructure as Code (Terraform) β€” Fully provisioned VPC, IAM, and EKS using reusable modules Not a static YAML-only deployment
Centralized GitOps Control Plane β€” ArgoCD runs in control cluster and deploys apps to dev/prod clusters Not a CI/CD-only showcase without real infrastructure
πŸš€ Production-Grade Deployment β€” NGINX with rolling updates, probes, and health checks Not a local Minikube experiment
πŸ“ˆ Auto-Scaling Enabled β€” Horizontal Pod Autoscaler (HPA) based on CPU metrics Not a toy monitoring setup without scaling validation
πŸ“Š Observability Integrated β€” Metrics Server + Prometheus + Grafana Not a slide-based architecture without live proof
πŸ” Secure by Design β€” IAM roles, OIDC (IRSA), private subnets, controlled networking
🧱 Modular & Scalable Architecture β€” Designed for real-world extensibility

This project demonstrates a real, deployable, multi-cluster cloud-native platform β€” built and validated end-to-end.

🧰 Tech Stack:

This project combines Infrastructure as Code, Kubernetes orchestration, and GitOps-driven deployment to build a production-style multi-cluster platform.

☁ Cloud Platform

  • AWS (ap-south-1) – Primary cloud provider
  • Amazon EKS – Managed Kubernetes control plane
  • Amazon VPC – Custom networking (public/private subnets, NAT, IGW),
  • Flow:Internet β†’ Public Subnet β†’ NAT β†’ Private Subnet β†’ EKS Nodes β†’ Pods

πŸ” Identity & Access (IRSA)

IIAM Roles for Service Accounts (IRSA) allows Kubernetes pods to securely access AWS services. Instead of storing AWS credentials inside containers:

  • Each pod is linked to an IAM role
  • AWS verifies identity using OIDC (OpenID Connect) πŸ‘‰ Flow: Pod β†’ OIDC identity β†’ IAM Role β†’ AWS service
  • KMS – Encryption at rest for cluster secrets
  • CloudWatch – Control plane logging
  • S3 + DynamoDB – Terraform remote backend & state locking

πŸ— Infrastructure as Code

  • Terraform (>= 1.5) – Modular infrastructure provisioning
  • Reusable modules: vpc, eks, iam
  • Remote state management for safe multi-user workflows

☸ Container Orchestration

  • Kubernetes (EKS 1.29+)
  • Managed Node Groups
  • Horizontal Pod Autoscaler (HPA)
  • Rolling updates & self-healing deployments

πŸ” GitOps & Deployment

ArgoCD acts as the GitOps controller running inside the control cluster.

Flow:

  • ArgoCD watches the GitOps repository
  • Detects changes in Kubernetes manifests
  • Connects to target clusters (dev / prod) using stored cluster credentials
  • Applies changes automatically and keeps clusters in sync with Git

πŸ‘‰ This ensures Git is always the single source of truth for deployments

πŸ“¦ Application Layer

  • Docker – Containerized Nginx application
  • Kubernetes manifests:
    • Deployment
    • Service
    • HPA
    • Namespace

πŸ›  Tooling

  • kubectl

πŸ“Έ Infrastructure Proof

1️⃣ Multi-Environment EKS Clusters

EKS Clusters


2️⃣ Custom VPC Architecture

Public & private subnets across multiple AZs with proper routing.

VPC Architecture


3️⃣ Worker Nodes (Managed Node Groups)

EKS-managed EC2 instances running in private subnets.

EC2 Nodes


4️⃣ GitOps Deployment via ArgoCD

Applications synced and healthy across dev & prod clusters.

ArgoCD Sync


πŸ”„ Request Lifecycle:

End-to-End Flow:

Runtime Request Flow

  1. User sends request to Kubernetes Service
  2. Service forwards traffic to one of the running Pods
  3. Pod processes the request (Nginx container)
  4. Metrics Server collects CPU usage
  5. HPA evaluates metrics:
    • If CPU > threshold β†’ increase replicas
    • If CPU normal β†’ maintain or reduce replicas

πŸ‘‰ This creates a self-healing and auto-scaling system without manual intervention.

Why This Design?

  • Clear separation of infra and app layers.
  • Multi-environment isolation.
  • Git-driven declarative deployment.
  • Production-aligned Kubernetes architecture.

πŸ›‘ Resilience & Security:

Failure Scenarios

  • Node failure β†’ Pods rescheduled automatically.
  • Pod crash β†’ Kubernetes self-healing restarts container.
  • High traffic β†’ HPA scales replicas.
  • Terraform drift β†’ Reconciliation via terraform apply.

Security Considerations

  • Private subnets for worker nodes.
  • IAM least-privilege roles.
  • IRSA for workload identity.
  • Encrypted EKS secrets via KMS.
  • Remote state locking via DynamoDB.

Scalability & Performance

  • Managed node group scaling.
  • Horizontal Pod Autoscaler.
  • Multi-AZ subnet distribution.
  • Stateless application design.

⚑ Quickstart (30-Second Run) :

Prerequisites:

  • EKS clusters (control, dev, prod) already provisioned via clusterforge-infra
  • ArgoCD installed on the control cluster
  • kubectl configured
  • ArgoCD CLI installed

Switch to Control Cluster

kubectl config use-context <control-cluster-context>

2️⃣ Apply ArgoCD Applications

Deploy Dev and Prod applications:

kubectl apply -f environments/dev/app.yaml
kubectl apply -f environments/prod/app.yaml

3️⃣ Verify ArgoCD Sync

  • argocd app list
  • You should see:
nginx-dev β†’ Synced & Healthy
nginx-prod β†’ Synced & Healthy

4️⃣ Validate Deployment in Target Cluster

Switch to dev or prod cluster:

kubectl config use-context <dev-cluster-context>
kubectl get pods -n nginx-app
kubectl get hpa -n nginx-app

You should see:

Running NGINX pods

HPA configured and active

βš™ Engineering Philosophy:

Trade-offs & Decisions

  • Chose EKS over self-managed Kubernetes β†’ Offloads control plane management, upgrades, and HA complexity to AWS.
    β†’ Focus stays on platform design and workload reliability instead of etcd and master node operations.
  • Separated infra and GitOps repositories β†’ Enforces clear ownership boundaries between platform and application layers.
    β†’ Reduces blast radius and aligns with real-world DevOps team structures.
  • Used Managed Node Groups β†’ Simplifies lifecycle management (auto-repair, scaling, upgrades).
    β†’ Avoids operational overhead of maintaining custom worker AMIs and autoscaling groups.
  • Prioritized Infrastructure as Code over manual console setup β†’ Guarantees reproducibility and auditability.
    β†’ Eliminates configuration drift and enables safe teardown/rebuild cycles.

Explicit Limitations

  • No production-grade ingress controller (for simplicity).
  • No service mesh implemented.
  • Monitoring stack optional (not hardened for production).

πŸ™Œ Contributions Welcome!

ClusterForge is an open-source initiative, and we welcome contributions from developers, data scientists, cloud engineers, and Devops enthusiasts!

πŸ’‘ Ideas You Can Work On:

  • Add production-grade Ingress + ALB.
  • Integrate full Prometheus/Grafana monitoring.
  • Implement CI validation for Terraform plans.
  • Add cost optimization policies.
  • Introduce blue/green deployment strategy.

πŸ› οΈ How to Contribute:

  • 🍴 Fork the repo
  • πŸ“¦ Create a new feature branch: git checkout -b feature-name
  • βœ… Make your changes and test them
  • πŸ“¬ Submit a pull request describing your enhancement
  • 🀝 Let's Build This Together! Made with πŸ’š by Manas Gantait

About

πŸ”₯ ClusterForge β€” A multi-cluster Kubernetes management system that uses GitOps to automate infrastructure setup, application deployment, scaling, and monitoring across environments like dev and production.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages