Skip to content

Build a workflow orchestrator from scratch. Learn distributed systems, concurrency, scheduling, and database-backed state machines by building one yourself. Inspired by Apache Airflow

License

Notifications You must be signed in to change notification settings

shivaam/miniflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MiniFlow

Build a workflow orchestrator from scratch. Learn distributed systems, concurrency, scheduling, and database-backed state machines by building one yourself.

Inspired by Apache Airflow — not a fork, not a clone, but a ground-up rebuild to understand how orchestrators actually work. The difficulty is front-loaded: concurrency from day one, no training wheels.

Why

Most "build your own X" projects stop at the data structure. This one doesn't. The goal is to go from a DAG adjacency list all the way to a distributed scheduler with a REST API, covering real CS concepts at every step:

  • Graph algorithms (topological sort, cycle detection, DFS vs BFS)
  • State machines (task lifecycle, valid transitions)
  • Concurrency (threads, locks, race conditions, deadlocks)
  • Persistence (SQL, optimistic locking, crash recovery)
  • Scheduling (cron parsing, backfill, distributed coordination)
  • Distributed systems (message queues, worker heartbeats, failure detection)
  • API design (REST, auth, real-time updates)

Every phase includes a comparison with the real Airflow codebase to see how production systems solve the same problems at scale.

Progress

Phase 1: Concurrent DAG Engine

  • DAG adjacency list with add_task / add_edge
  • Topological sort (Kahn's algorithm)
  • Cycle detection with error path
  • Task state enum and transitions
  • Concurrent executor with raw threading
  • Shared TaskStateStore with locks
  • Dependency-aware task queuing
  • Concurrency limits (max parallel tasks)
  • Failure propagation to downstream tasks
  • Heartbeat and zombie detection
  • Trigger rules (all_success, all_failed, one_success, one_failed, all_done)
  • Retry with configurable count and delay
  • ASCII DAG visualization with states

Phase 2: Persistence & State Machine

  • SQLAlchemy schema (dag, dag_run, task_instance)
  • Optimistic locking on state transitions
  • Task instance history table
  • Idempotent DAG run creation
  • DAG run state derived from task states
  • CLI: trigger, status, list-dags, list-runs, task-log
  • Log capture to files
  • Crash recovery on startup
  • Simple DB migrator

Phase 3: The Scheduler

  • DAG file discovery from dags/ folder
  • DAG file import with error handling
  • Cron expression parser (hand-written)
  • Scheduler loop (scan → create runs → queue tasks → sleep)
  • Backfill command
  • max_active_runs_per_dag
  • max_active_tasks_per_dag
  • Pool / slot limits
  • DAG pause/unpause
  • Catchup behavior
  • start_date / end_date on DAGs

Phase 4: Multi-Process Execution & Distribution

  • Process-based LocalExecutor
  • BaseExecutor interface
  • SequentialExecutor for debugging
  • CeleryExecutor with Redis broker
  • Task command serialization
  • Worker heartbeats
  • Dead worker detection and task rescheduling
  • Executor plugin loading by config
  • Graceful shutdown on SIGTERM
  • Task adoption on scheduler restart

Phase 5: REST API & Web UI

  • FastAPI REST API (CRUD for dags, runs, tasks)
  • Trigger DAG run endpoint
  • Pause/unpause endpoint
  • Log streaming endpoint
  • Health check endpoint
  • OpenAPI spec auto-generation
  • Basic web UI (DAG list, run detail, task logs)
  • DAG graph visualization in UI
  • Basic auth + API key auth
  • RBAC (viewer, editor, admin)

Phase 6: Operators, Hooks & Extensibility

  • BaseOperator with lifecycle hooks
  • PythonOperator
  • BashOperator
  • HttpOperator
  • SqlOperator
  • BaseHook with get_conn()
  • Connection model with encryption
  • Connection CLI (add/delete/list)
  • XCom push/pull
  • Variables (get/set with optional encryption)
  • BaseSensor with poke and reschedule modes
  • Jinja2 templating in operator params
  • Task/DAG callbacks (on_success, on_failure)
  • SLA monitoring and alerting

Phase 7: Advanced (Bonus)

  • Dynamic task mapping
  • Task groups
  • DAG versioning
  • Plugin system
  • Metrics (StatsD/Prometheus)
  • Alerting (email/webhook)
  • Kubernetes executor
  • Asset-aware scheduling
  • Deferrable operators / Triggerer

Learning Docs

The docs/ folder is the course material — it documents what each task teaches, not just what it does:

File What's in it
docs/plan.md Full build plan with phase details
docs/cs-learnings.md CS concepts learned (algorithms, concurrency, etc.)
docs/python-learnings.md Python-specific patterns and syntax
docs/thinking-frameworks.md Mental models for design, debugging, testing
docs/project-decisions.md Technical decisions and design tradeoffs
docs/phase-NN-*/README.md Per-phase curriculum, notes, Airflow comparisons

Setup

uv sync --group dev

Run Tests

uv run pytest -v -x

Lint & Type Check

uv run ruff check src/ tests/
uv run mypy src/

License

MIT

About

Build a workflow orchestrator from scratch. Learn distributed systems, concurrency, scheduling, and database-backed state machines by building one yourself. Inspired by Apache Airflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages