A Kubernetes-native platform for building modern, declarative data pipelines with clear boundaries between ingestion and transformation.
- 🎛️ Kubernetes Operator - Go-based CRD management and pipeline orchestration
- 📥 Ingest Workload - Type-safe data ingestion (Python)
- 🔄 Transform Workload - dbt-based data transformation
- ⚡ Trigger Workload - Event-driven pipeline activation (Go)
- 🛠️ Development Environment - Local development setup and database provisioning
- ☁️ Infrastructure - Cloud infrastructure automation and deployment
- 📋 Examples - Comprehensive YAML examples and use cases
A complete solution for orchestrating data pipelines in Kubernetes environments. Combines a powerful Kubernetes operator with specialized workloads to provide a declarative, event-driven approach to data pipeline management.
- Unified Pipeline Lifecycle - Connect ingestion with transformation in a single application lifecycle
- Native Kubernetes Resources - Each step runs on 100% native K8s resources
- Event-Driven Orchestration - React to file drops, Pub/Sub messages, and BigQuery updates
- Built-in Observability - Comprehensive status tracking and monitoring
Pipeline Forge consists of two main components:
Go-based CRD management and pipeline orchestration
- Custom Resource Definitions (CRDs) for pipeline definition
- Automatic reconciliation and lifecycle management
- RBAC integration and resource management
- Event-driven trigger management
Production-ready data processing components
- Ingest - Type-safe data ingestion from MySQL, PostgreSQL to BigQuery
- Transform - dbt-based data transformation
- Trigger - Event processing for GCS, Pub/Sub, and BigQuery
Component | Technology | Purpose |
---|---|---|
Operator | Go, Kubernetes, Kubebuilder | Pipeline orchestration and CRD management |
Ingest | Python 3.13+, Pydantic, Typer | Type-safe data ingestion with validation |
Transform | dbt Core, BigQuery | Data transformation and analytics |
Triggers | Go, Google Cloud APIs | Event-driven pipeline activation |
Dev Environment | Docker Compose, SQL | Local development and testing |
Infrastructure | Terraform, GCP | Cloud infrastructure automation |
pipeline-forge/
├── operator/ # Kubernetes operator (Go)
├── workloads/ # Data processing components
│ ├── ingest/ # Type-safe ingestion (Python)
│ ├── transform/ # dbt transformations
│ └── trigger/ # Event processing (Go)
├── dev/ # Development environment setup
├── infrastructure/ # Cloud infrastructure automation
└── docs/ # Documentation
Current State: Work in Progress
Component | Status | Description |
---|---|---|
🎛️ Operator API | ⚡ Functional | CRD definitions and API contracts |
🎛️ Operator Reconciliation | 🚧 In Development | Pipeline orchestration and lifecycle management |
📥 Ingest Workload | ⚡ Functional | Type-safe data ingestion (Python) |
🔄 Transform Workload | ⚡ Functional | dbt-core data transformation |
⚡ Trigger Workload | 🚧 In Development | Event-driven pipeline activation (Go) |
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Contributions are welcome! Please open an issue first to discuss any changes before submitting a pull request.
This is a personal open-source project, developed independently on my own time and equipment.
It is not affiliated with, endorsed by, or representing my employer.