Skip to content

DanielBlei/pipeline-forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline Forge 🔥

License: Apache 2.0 Go Version Kubernetes Python dbt

A Kubernetes-native platform for building modern, declarative data pipelines with clear boundaries between ingestion and transformation.

🚀 Quick Navigation


🎯 What is Pipeline Forge?

A complete solution for orchestrating data pipelines in Kubernetes environments. Combines a powerful Kubernetes operator with specialized workloads to provide a declarative, event-driven approach to data pipeline management.

Key Benefits

  • Unified Pipeline Lifecycle - Connect ingestion with transformation in a single application lifecycle
  • Native Kubernetes Resources - Each step runs on 100% native K8s resources
  • Event-Driven Orchestration - React to file drops, Pub/Sub messages, and BigQuery updates
  • Built-in Observability - Comprehensive status tracking and monitoring

🏗️ Architecture Overview

Pipeline Forge consists of two main components:

Go-based CRD management and pipeline orchestration

  • Custom Resource Definitions (CRDs) for pipeline definition
  • Automatic reconciliation and lifecycle management
  • RBAC integration and resource management
  • Event-driven trigger management

Production-ready data processing components

  • Ingest - Type-safe data ingestion from MySQL, PostgreSQL to BigQuery
  • Transform - dbt-based data transformation
  • Trigger - Event processing for GCS, Pub/Sub, and BigQuery

🛠️ Technology Stack

Component Technology Purpose
Operator Go, Kubernetes, Kubebuilder Pipeline orchestration and CRD management
Ingest Python 3.13+, Pydantic, Typer Type-safe data ingestion with validation
Transform dbt Core, BigQuery Data transformation and analytics
Triggers Go, Google Cloud APIs Event-driven pipeline activation
Dev Environment Docker Compose, SQL Local development and testing
Infrastructure Terraform, GCP Cloud infrastructure automation

📊 Project Overview

📁 Structure

pipeline-forge/
├── operator/           # Kubernetes operator (Go)
├── workloads/          # Data processing components
│   ├── ingest/        # Type-safe ingestion (Python)
│   ├── transform/     # dbt transformations
│   └── trigger/       # Event processing (Go)
├── dev/               # Development environment setup
├── infrastructure/    # Cloud infrastructure automation
└── docs/              # Documentation

🚧 Status

Current State: Work in Progress

Component Status Description
🎛️ Operator API Functional CRD definitions and API contracts
🎛️ Operator Reconciliation 🚧 In Development Pipeline orchestration and lifecycle management
📥 Ingest Workload Functional Type-safe data ingestion (Python)
🔄 Transform Workload Functional dbt-core data transformation
⚡ Trigger Workload 🚧 In Development Event-driven pipeline activation (Go)

📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please open an issue first to discuss any changes before submitting a pull request.

⚠️ Disclaimer

This is a personal open-source project, developed independently on my own time and equipment.
It is not affiliated with, endorsed by, or representing my employer.

About

Kubernetes-native data pipeline platform and orchestration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published