Open Data Platform is a fully open-source, developer-first analytics platform that combines ingestion, transformation, orchestration, governance, BI, and observability in one stack.
https://fresh-minds.github.io/FreshDataPlatform/
- End-to-end batch pipelines with medallion layering (
bronze -> silver -> gold) - Airflow orchestration for ingestion and transformation workflows
- dbt + Postgres serving models for analytics
- Metadata and lineage with DataHub
- Observability with Prometheus, Grafana, Loki, and Tempo
- React launchpad (
frontend/) for platform access and operations
- Hybrid pipeline runtime:
- Spark/Fabric-compatible pipelines in
pipelines/ - Postgres-only fallback pipeline for local execution without Java/Spark
- Spark/Fabric-compatible pipelines in
- Governance and quality:
- Schema-as-code in
schema/ - Config-driven quality and governance checks
- E2E QA suite with evidence artifacts
- Schema-as-code in
- Deployment flexibility:
- Local Docker Compose
- Local Kubernetes (kind)
- AKS and Scaleway Kubernetes
- Security and identity:
- Keycloak-based SSO flows for Airflow, DataHub, and MinIO
- Dedicated SSO test suite and reports
The platform is composed of three planes: operator plane, control plane, and data plane.
flowchart LR
subgraph OperatorPlane[Operator Plane]
Portal["React Launchpad (:3000)"]
AirflowUI["Airflow UI (:8080)"]
DataHubUI["DataHub UI (:9002)"]
SupersetUI["Superset UI (:8088)"]
GrafanaUI["Grafana UI (:3001)"]
end
subgraph ControlPlane[Control Plane]
Scheduler["Airflow Scheduler"]
DAGs["DAGs (dags/)"]
Tests["QA + SSO Test Suites"]
end
subgraph DataPlane[Data Plane]
Sources["External Sources (CBS, Adzuna, UWV, RSS, Sitemaps)"]
MinIO["MinIO (Bronze/Silver/Gold)"]
Warehouse["Postgres Warehouse"]
DataHub["DataHub GMS + Kafka + Elasticsearch + MySQL"]
O11y["Prometheus + Loki + Tempo"]
end
Portal --> AirflowUI
Portal --> DataHubUI
Portal --> SupersetUI
Portal --> GrafanaUI
DAGs --> Scheduler
Scheduler --> MinIO
Scheduler --> Warehouse
Scheduler --> DataHub
Sources --> MinIO
MinIO --> Warehouse
Warehouse --> SupersetUI
Scheduler --> O11y
MinIO --> O11y
Warehouse --> O11y
See ARCHITECTURE.md for deeper runtime and component details.
- Python
3.9+ - Docker + Docker Compose
- Make
cp .env.template .env
python3 -m venv .venv
source .venv/bin/activate
make dev-install
./scripts/platform/bootstrap_all.sh --auto-fill-envNotes:
bootstrap_all.shcreates.venvif missing and repairs broken interpreter links.- Bootstrap installs dependencies with
pip install -e ".[dev,pipeline]". - Use
--skip-dev-installonly when you manage dependencies manually. - dbt bootstrap uses
DBT_THREADS=1by default to reduce Postgres deadlocks.
Full local stack:
docker compose up -dMinimal local stack (no DataHub, no heavy observability, no jupyter):
make compose-up-minimalMinimal mode notes:
- Uses
docker-compose.minimal.yml. - Seeds
ODP Staffing DemandandPlatform Metadata Operationsdashboards. - Runs
scripts/testing/verify_compose_minimal.shby default. - Set
COMPOSE_MINIMAL_SMOKE_AFTER_UP=falseto skip smoke checks.
Optional notebook workspace:
docker compose up -d jupyterCanonical Postgres-only pipeline:
make run-odp-staffing-demand
make run-odp-staffing-demand-metadataRun a specific entrypoint:
LOCAL_MOCK_PIPELINES=false make run PIPELINE=odp_staffing_demand.bronze_cbs_vacancy_ratemake test
make qa-test- Frontend launchpad:
http://localhost:3000 - Airflow:
http://localhost:8080 - Superset:
http://localhost:8088 - DataHub:
http://localhost:9002 - dbt docs:
http://localhost:8089
Generate and host dbt docs:
make dbt-docs-refreshWatch dbt docs and lineage updates during development:
make dbt-docs-watchInitialize metadata tables:
make warehouse-metadata-initFor complete deployment guidance, use DEPLOYMENT.md. Common shortcuts are below.
AKS deploy:
make k8s-aks-upAKS deploy with Key Vault as secret source:
AKS_KEY_VAULT_NAME=aitrialkv1234abcd make k8s-aks-upAKS deploy with direct .env to Kubernetes secret sync:
AKS_USE_KEY_VAULT=false make k8s-aks-upAKS image-only update:
make k8s-aks-update-imagesLimit AKS image updates to selected services:
AKS_IMAGES=frontend,portal-api make k8s-aks-update-imagesScaleway redeploy (full/minimal):
make scaleway-redeploy-all
make scaleway-redeploy-all-minimalScaleway note:
- Deploy/destroy scripts can fall back to
.envforSCW_ACCESS_KEY,SCW_SECRET_KEY, andSCW_DEFAULT_PROJECT_IDwhen these are not exported. - Set
SKIP_IMAGE_BUILD=truefor config-only redeploy iterations.
Required for map dashboards:
echo 'MAPBOX_API_KEY=<your-mapbox-public-token>' >> .env
docker compose up -d --force-recreate supersetVerification:
docker exec open-data-platform-superset sh -lc 'python -c "import os; print(bool(os.getenv(\"MAPBOX_API_KEY\")))"'airflow/ Airflow image and web auth config
dags/ Orchestration DAGs
src/ Ingestion framework and source modules
pipelines/ Domain pipeline logic
shared/ Shared runtime, config, connectors
scripts/ Bootstrap, QA, governance, and ops scripts
dbt/ dbt project, models, seeds, and templates
schema/ Contracts, DBML, glossary, and DQ rules
tests/ Unit, integration, governance, E2E, and SSO suites
frontend/ Operator launchpad
k8s/ kind and AKS manifests
deploy/ Kustomize deployment manifests
ops/ Keycloak and observability configurations
docs/ Supporting docs and diagrams
guides/ Topic-specific implementation guides
| Topic | Document |
|---|---|
| Development workflow and coding standards | DEVELOPMENT.md |
| Deployment modes, env, and secrets | DEPLOYMENT.md |
| Runtime architecture and component boundaries | ARCHITECTURE.md |
| Medallion entities and serving model details | DATA_MODEL.md |
| Security and secret handling | SECURITY.md, GIT_SECURITY_CHECKLIST.md |
| Ingestion onboarding guide | docs/INGESTION_GUIDE.md |
| Data quality framework | guides/data_quality_framework.md |
| End-to-end platform testing | docs/e2e_data_platform_testing.md |
| CI/CD runbooks | docs/cicd/RUNBOOKS.md |
- Guide: docs/INGESTION_GUIDE.md
- Python templates:
src/ingestion/_template/ - DAG template:
dags/_template_dag.py - dbt model templates:
dbt/_model_templates/
Do not commit real credentials, tokens, or private keys. Use .env (ignored by git) and keep .env.template placeholder-only.
See CONTRIBUTING.md for branch workflow, required checks, DCO sign-off, and third-party license guardrails.
This project is licensed under the MIT License. See LICENSE.
Third-party runtime components keep their own licenses. See THIRD_PARTY_LICENSES.md.
