Build a workflow orchestrator from scratch. Learn distributed systems, concurrency, scheduling, and database-backed state machines by building one yourself.
Inspired by Apache Airflow — not a fork, not a clone, but a ground-up rebuild to understand how orchestrators actually work. The difficulty is front-loaded: concurrency from day one, no training wheels.
Most "build your own X" projects stop at the data structure. This one doesn't. The goal is to go from a DAG adjacency list all the way to a distributed scheduler with a REST API, covering real CS concepts at every step:
- Graph algorithms (topological sort, cycle detection, DFS vs BFS)
- State machines (task lifecycle, valid transitions)
- Concurrency (threads, locks, race conditions, deadlocks)
- Persistence (SQL, optimistic locking, crash recovery)
- Scheduling (cron parsing, backfill, distributed coordination)
- Distributed systems (message queues, worker heartbeats, failure detection)
- API design (REST, auth, real-time updates)
Every phase includes a comparison with the real Airflow codebase to see how production systems solve the same problems at scale.
- DAG adjacency list with add_task / add_edge
- Topological sort (Kahn's algorithm)
- Cycle detection with error path
- Task state enum and transitions
- Concurrent executor with raw threading
- Shared TaskStateStore with locks
- Dependency-aware task queuing
- Concurrency limits (max parallel tasks)
- Failure propagation to downstream tasks
- Heartbeat and zombie detection
- Trigger rules (all_success, all_failed, one_success, one_failed, all_done)
- Retry with configurable count and delay
- ASCII DAG visualization with states
- SQLAlchemy schema (dag, dag_run, task_instance)
- Optimistic locking on state transitions
- Task instance history table
- Idempotent DAG run creation
- DAG run state derived from task states
- CLI: trigger, status, list-dags, list-runs, task-log
- Log capture to files
- Crash recovery on startup
- Simple DB migrator
- DAG file discovery from dags/ folder
- DAG file import with error handling
- Cron expression parser (hand-written)
- Scheduler loop (scan → create runs → queue tasks → sleep)
- Backfill command
- max_active_runs_per_dag
- max_active_tasks_per_dag
- Pool / slot limits
- DAG pause/unpause
- Catchup behavior
- start_date / end_date on DAGs
- Process-based LocalExecutor
- BaseExecutor interface
- SequentialExecutor for debugging
- CeleryExecutor with Redis broker
- Task command serialization
- Worker heartbeats
- Dead worker detection and task rescheduling
- Executor plugin loading by config
- Graceful shutdown on SIGTERM
- Task adoption on scheduler restart
- FastAPI REST API (CRUD for dags, runs, tasks)
- Trigger DAG run endpoint
- Pause/unpause endpoint
- Log streaming endpoint
- Health check endpoint
- OpenAPI spec auto-generation
- Basic web UI (DAG list, run detail, task logs)
- DAG graph visualization in UI
- Basic auth + API key auth
- RBAC (viewer, editor, admin)
- BaseOperator with lifecycle hooks
- PythonOperator
- BashOperator
- HttpOperator
- SqlOperator
- BaseHook with get_conn()
- Connection model with encryption
- Connection CLI (add/delete/list)
- XCom push/pull
- Variables (get/set with optional encryption)
- BaseSensor with poke and reschedule modes
- Jinja2 templating in operator params
- Task/DAG callbacks (on_success, on_failure)
- SLA monitoring and alerting
- Dynamic task mapping
- Task groups
- DAG versioning
- Plugin system
- Metrics (StatsD/Prometheus)
- Alerting (email/webhook)
- Kubernetes executor
- Asset-aware scheduling
- Deferrable operators / Triggerer
The docs/ folder is the course material — it documents what each task teaches,
not just what it does:
| File | What's in it |
|---|---|
docs/plan.md |
Full build plan with phase details |
docs/cs-learnings.md |
CS concepts learned (algorithms, concurrency, etc.) |
docs/python-learnings.md |
Python-specific patterns and syntax |
docs/thinking-frameworks.md |
Mental models for design, debugging, testing |
docs/project-decisions.md |
Technical decisions and design tradeoffs |
docs/phase-NN-*/README.md |
Per-phase curriculum, notes, Airflow comparisons |
uv sync --group devuv run pytest -v -xuv run ruff check src/ tests/
uv run mypy src/MIT