Production-ready Dagster deployment using Helm + Kustomize for Kubernetes clusters. This repository provides a complete deployment setup for running Dagster as an orchestration platform for data pipelines.
Live Example: Deployed at dagster.homelab.lan | Status: β
Running | Version: Dagster 1.12.6
- GitOps-Ready: Kustomize-based deployment with Helm chart integration
- Secure by Default: Sealed Secrets for credential management
- Production Architecture: Separation of Dagster instance and code locations
- Scalable Design: Supports multiple code locations and horizontal scaling
- Battle-Tested: Includes real-world troubleshooting guides and operational procedures
flowchart LR
subgraph "Public Internet"
API1(Binance)
API2(ByBit)
API3(Gate.io)
end
subgraph "Private Network"
subgraph Kubernetes
subgraph Dagster
E(Extract Job)
T(Transform Job)
end
PS(PostgreSQL)
SS(SuperSet)
DS(Dashboard)
end
subgraph LXC
M(MinIO)
end
end
API1 --> E --> M
API2 --> E --> M
API3 --> E --> M
M --> T --> PS --> SS --> DS
flowchart TD
subgraph Configuration
HelmValues["Helm Values / Kustomization<br/>(Defines code locations)"]
end
subgraph K8s_Dagster["Kubernetes Namespace: dagster"]
direction TB
subgraph Control_Plane["Control Plane"]
style D_Web fill:#e1f5fe,stroke:#01579b
style D_Daemon fill:#e1f5fe,stroke:#01579b
D_Web["Dagster Webserver<br/>(UI & API)"]
D_Daemon["Dagster Daemon<br/>(Scheduler)"]
end
subgraph Code_Exec["Code Execution"]
style U_Code fill:#f3e5f5,stroke:#4a148c
U_Code["User Code Pod<br/>(gRPC: 3030)"]
Py_Defs["Python Code<br/>(Assets/Jobs)"]
end
Service["Code Location Service<br/>ClusterIP: 3030"]
HelmValues -.->|Configures| D_Web
HelmValues -.->|Configures| D_Daemon
D_Web -- "gRPC" --> Service
D_Daemon -- "gRPC" --> Service
Service --> U_Code
U_Code --> Py_Defs
end
subgraph Database_NS["Database Namespace"]
DB[("PostgreSQL")]
end
D_Daemon -- "Run State" --> DB
D_Web -- "Run History" --> DB
Key Design Decisions:
- Stateless Dagster Instance: No persistent volumes required
- Separate Code Locations: Jobs run in isolated pods from control plane
- External Dependencies: PostgreSQL for metadata, MinIO for raw data storage
- gRPC Communication: Webserver/Daemon communicate with code locations via gRPC (port 3030)
| Component | Version | Purpose |
|---|---|---|
| Kubernetes | 1.24+ | Container orchestration |
| PostgreSQL | 17.6.0+ | Dagster metadata storage |
| MinIO (optional) | Latest | Object storage for raw data |
| MetalLB (bare-metal) | Latest | LoadBalancer service support |
| Traefik | Latest | Ingress controller |
kubectl- Kubernetes CLIkustomize(v5.0.0+) - Manifest managementkubeseal- Sealed Secrets encryptionhelm(optional) - Helm chart management
git clone https://github.com/arookieds/dagster-deployment.git
cd dagster-deploymentkubectl apply -f base/namespace.yamlCreate sealed secrets for PostgreSQL credentials:
# Create plain secret (DO NOT COMMIT)
kubectl create secret generic postgres-secrets \
--from-literal=postgresql-password='your-password-here' \
--namespace dagster \
--dry-run=client -o yaml > secret.yaml
# Seal the secret
kubeseal -o yaml < secret.yaml > overlays/prod/sealed-secret.yaml
# Clean up plain secret
rm secret.yamlEdit base/kustomization.yaml to configure:
- Code location servers (workspace.servers)
- PostgreSQL connection details
- Resource limits
# Deploy using Kustomize
kubectl apply -k overlays/prod
# Verify deployment
kubectl get pods -n dagster
kubectl get svc -n dagsterOption A: Port Forward (Testing)
kubectl port-forward -n dagster svc/dagster-dagster-webserver 3000:80
# Open: http://localhost:3000Option B: Ingress (Production)
# Access via configured domain
curl http://dagster.homelab.lanThe kustomization.yaml includes inline Helm values for the Dagster chart:
helmCharts:
- name: dagster
repo: https://dagster-io.github.io/helm
version: 1.12.6
valuesInline:
# PostgreSQL connection
postgresql:
enabled: false
postgresqlHost: postgresql.database.svc.cluster.local
postgresqlDatabase: dagster
postgresqlUsername: dagster
# Code locations (user deployments)
dagster-webserver:
workspace:
servers:
- host: "trading-data"
port: 3030
name: "trading-data"Add More Code Locations:
workspace:
servers:
- host: "crypto-extract"
port: 3030
name: "crypto-extract"
- host: "crypto-transform"
port: 3030
name: "crypto-transform"Enable High Availability:
dagster-webserver:
replicaCount: 3Adjust Resource Limits:
dagster-webserver:
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 250m
memory: 256Midagster-deployment/
βββ README.md # This file
βββ DEPLOYMENT.md # Full deployment documentation
βββ base/ # Base Kubernetes resources
β βββ kustomization.yaml # Helm chart + base config
β βββ namespace.yaml # Namespace definition
β βββ ingressroute.yaml # Traefik ingress (optional)
βββ overlays/
βββ prod/ # Production environment
βββ kustomization.yaml # Production patches
βββ sealed-secret.yaml # Encrypted secrets
Note: overlays will be added at a later stage.
Symptoms: Pods in Pending or CrashLoopBackOff state
Check:
# View pod status
kubectl get pods -n dagster
# Check logs
kubectl logs -n dagster <pod-name>
# Check events
kubectl describe pod -n dagster <pod-name>Common Causes:
- Missing secrets: Ensure
postgres-secretssealed secret exists - PostgreSQL unreachable: Verify PostgreSQL pod running in
databasenamespace - Resource limits: Check if pod is OOMKilled due to memory limits
Symptoms: curl http://dagster.homelab.lan returns connection refused or 404
Diagnosis:
# Find actual service name created by Helm
kubectl get svc -n dagster
# Expected: dagster-dagster-webserverFix: Update ingressroute.yaml to use correct service name:
services:
- name: dagster-dagster-webserver # Not just "dagster"
port: 80Helm naming convention: {releaseName}-{chartName}-{componentName}
Symptoms: Dagster UI shows "Code location unavailable"
Check gRPC connectivity:
# Verify code location pod running
kubectl get pods -n dagster -l component=user-code
# Check webserver can reach code location
kubectl exec -n dagster <webserver-pod> -- \
nc -zv <code-location-service> 3030Common Causes:
- Service name mismatch in
workspace.serversconfiguration - Code location pod not running
- gRPC port 3030 not exposed in code location service
For comprehensive deployment documentation including:
- Detailed architecture explanations
- Backup and restore procedures
- Monitoring and alerting setup
- Migration paths and scaling strategies
- Complete troubleshooting guide
See DEPLOYMENT.md
This deployment is designed for:
- Data Engineering Pipelines: ETL/ELT workflows for batch processing
- Financial Data Processing: Crypto market data extraction and transformation
- ML Pipeline Orchestration: Scheduling model training and inference
- Multi-tenant Deployments: Separate code locations per team/project
Not suitable for:
- Real-time streaming (use Kafka/Flink for high-frequency data)
- Extremely high-throughput (>10k jobs/minute)
- Windows-based deployments (Linux containers only)
- Sealed Secrets: All credentials encrypted using Sealed Secrets controller
- No External Exposure: Dagster UI accessible only within cluster network or via VPN
- Namespace Isolation: Runs in dedicated
dagsternamespace - Minimal Privileges: Service accounts follow principle of least privilege
For Production:
- Enable authentication (OAuth2, LDAP, SAML)
- Implement Network Policies for namespace isolation
- Use separate PostgreSQL instance (not shared)
- Enable TLS for gRPC communication
Contributions welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please include:
- Description of changes
- Rationale for the change
- Testing performed (include kubectl commands and output)
- Documentation updates (if applicable)
This project is licensed under the MIT License - see LICENSE file for details.
- Dagster Team - For the excellent orchestration framework
- Bitnami - For well-maintained Helm charts
- Kubernetes Community - For robust container orchestration
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Dagster Slack: dagster.slack.com
| Date | Version | Changes |
|---|---|---|
| 2025-12-14 | 1.0.0 | Initial public release |
| 2025-12-12 | 0.9.0 | Internal deployment and testing |
β If this repository helped you, please consider giving it a star!