Skip to content

Let's be honest - most data pipeline frameworks treat types as suggestions. Config files are strings. Schemas are "validated" at runtime. Data quality is an afterthought. So, let's do differently

License

Notifications You must be signed in to change notification settings

vim89/flowforge

Repository files navigation

flowforge

See the quick start and architecture overview.

🎯 The brutal truth about Data Engineering today

Data engineering is broken. And we're all pretending it's fine.

Let's be honest -

  1. Most data pipeline frameworks treat types as suggestions.
  2. Config files are strings.
  3. Schemas are "validated" at runtime.
  4. Data quality is an afterthought.

What's actually wrong:

  • Configuration Hell - YAML/JSON configs everywhere, runtime failures galore
  • Type Chaos - String everywhere, no compile-time guarantees
  • Effect Anarchy - Side effects scattered, no resource safety
  • Template Madness - Maven archetypes with 2000+ line Velocity templates
  • Cloud Lock-in - Write once, run nowhere else
  • Quality Afterthought - Manual data quality checks, always too late
  • Schema Evolution Hell - Break everything, rollback manually
  • Audit Nightmare - Scattered logging, incomplete traces
  • Runtime Roulette - Deploy and pray, discover errors in production

Here's what we do differently:

🛑 This won't even compile if your schema doesn't match

// This won't even compile if your schema doesn't match
val pipeline = DataPipelineFactory[IO]
  .source(blob"gs://raw-data/sales/*.parquet")
  .contract(SalesDataContract.strict)  // Compile-time contract validation
  .transform(_.filter(_.amount >= 999))    // Type-safe transformations
  .quality(nonNull("invoice_number") and unique("customer_id"))  // Built-in quality checks
  .sink(BigQuerySink("analytics.customers"))
  .build

// Run it with automatic retry, monitoring, and error handling
pipeline.run.unsafeRunSync()

That's it. Production-ready. Type-safe. Effect-safe. Audited.

📊 Quantified revolution

Aspect Industry standard FlowForge Improvement
Setup Time 2-3 days 30 seconds 99.8% faster
Runtime Errors Constant Zero 100% eliminated
Configuration Bugs Daily pain Impossible 100% eliminated
Cloud Portability Rewrite everything Zero changes ∞ better

🔥 Get ready for the revolution!

30-Second Proof: See It Work

Drift Demo - Compile-Time Contract Enforcement

# 1. Edit any contract file, change type (e.g., id: Long → id: String)
vim modules/contracts/src/main/scala/Contract.scala

# 2. Try to compile - FAILS immediately with clear error
sbt compile
# Error: implicitNotFound - Contract drift detected!
# Out: String vs Contract: Long
# Fix types or relax policy (Backward/Forward)

Constraint Guard - Delta Lake Enforcement

# 1. After fixing types, try inserting invalid data
sbt "examples-spark/runMain com.flowforge.examples.spark.UsersPipeline"

# 2. Delta automatically rejects invalid data:
# ❌ NOT NULL constraint violated
# ❌ CHECK constraint failed: age must be between 13-120
# ✅ Only valid data persisted

Lineage Blink - See Everything Automatically

# 1. Start Marquez (OpenLineage backend)
docker compose -f ops/marquez/docker-compose.yml up -d

# 2. Run any pipeline
sbt "examples-spark/runMain com.flowforge.examples.spark.UsersPipeline"

# 3. Open Marquez UI - lineage lights up instantly
open http://localhost:3000
# → Jobs → Pipeline runs with START/COMPLETE/FAIL events
# → Complete execution timeline and lineage graph
# → Zero configuration required

The Promise: Change the contract → won't compile (build fails fast). Fix types → compiles (type safety enforced). Run locally in seconds → see DQ + Delta constraints catch regressions. Open Marquez → see lineage light up automatically.

About

Let's be honest - most data pipeline frameworks treat types as suggestions. Config files are strings. Schemas are "validated" at runtime. Data quality is an afterthought. So, let's do differently

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages