Skip to content

codeflash-ai/loom

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Loom

Orchestration framework for AI pipelines with built-in evaluation

Python License Version

⚠️ Alpha Software: Early development stage. Use for evaluation and experimentation.


What is Loom?

Loom is the "dbt for AI(E)TL" - a declarative orchestration framework for AI pipelines.

Traditional ETL becomes AI(E)TL: Extract → Transform → Evaluate → Load

Declarative YAML pipelines with built-in quality gates ensure your AI outputs meet quality thresholds before reaching production.

Core Value: Production-grade AI pipeline orchestration without complexity, vendor lock-in, or hidden evaluation gaps.

Status: Alpha software (v0.1.0-alpha). Functional but early-stage. Best suited for evaluation, experimentation, and development.

Why Loom?

The Problem: Building production AI pipelines requires orchestration AND evaluation. Existing tools do one or the other, not both.

What Loom Provides:

  • Declarative Pipelines: Define AI workflows as version-controlled YAML
  • Built-in Evaluation: Quality gates using Arbiter prevent bad outputs from reaching production
  • Provider-Agnostic: Works with OpenAI, Anthropic, Google, Groq - no vendor lock-in
  • Production-Ready: Circuit breakers, retry logic, timeout enforcement

Use Case Example: A sentiment analysis pipeline needs quality assurance. Loom provides:

  1. Declarative YAML pipeline definition (Extract → Transform → Evaluate → Load)
  2. Automatic evaluation with configurable quality gates
  3. Quarantine pattern for failed records
  4. Complete audit trail of transformations and evaluations

Quick Example

# pipelines/customer_sentiment.yaml
name: customer_sentiment
version: 2.1.0

extract:
  source: postgres://customers/reviews

transform:
  prompt: prompts/classify_sentiment.txt
  model: gpt-4o-mini
  batch_size: 50

evaluate:
  evaluators:
    - type: semantic
      threshold: 0.8
    - type: custom_criteria
      criteria: "Accurate, no hallucination"
      threshold: 0.75
  quality_gate: all_pass

load:
  destination: postgres://analytics/sentiment_scores

Run it:

loom run customer_sentiment

Key Features

  • ✅ Declarative Pipelines: YAML-based pipeline definitions (Extract, Transform, Evaluate, Load)
  • ✅ Built-in Evaluation: Arbiter integration with quality gates (all_pass, majority_pass, any_pass, weighted)
  • ✅ Provider-Agnostic LLMs: OpenAI, Anthropic, Google, Groq support
  • ✅ Multiple Data Formats: CSV, JSON, JSONL, Parquet support
  • ✅ Quality Gates: Four gate types with precise mathematical definitions
  • ✅ Circuit Breaker Pattern: Production resilience for LLM calls
  • ✅ Quarantine Pattern: Failed records logged with failure reasons for investigation
  • ✅ CLI Interface: loom run, loom validate commands

Getting Started

# Clone the repository
git clone https://github.com/evanvolgas/loom.git
cd loom

# Install dependencies
uv venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
uv pip install -e ".[dev]"

# Run tests
pytest

Relationship to Arbiter

Loom uses Arbiter as its evaluation engine:

  • Arbiter: Evaluates individual LLM outputs (what)
  • Loom: Orchestrates pipelines with evaluation gates (when/how)

Separate projects, complementary goals.

Roadmap

Note: This is a personal project. Roadmap items are ideas and explorations, not commitments. Priorities and timelines may change based on what's useful.

Phase 1 - Foundation ✅ (Completed)

  • Core pipeline engine (Extract, Transform, Evaluate, Load)
  • YAML pipeline parser
  • Arbiter integration with quality gates
  • Circuit breaker and resilience patterns
  • Basic CLI (loom run, loom validate)

Future Ideas (No timeline, exploring as needed)

  • Database connectors (PostgreSQL, MySQL)
  • Cost tracking and monitoring
  • Semantic caching for duplicate inputs
  • Smart retry logic with failure-type awareness
  • Testing framework for pipelines
  • More advanced monitoring and alerting

Contributions welcome! This is a personal project, but if you find it useful and want to contribute, pull requests are appreciated.

License

MIT License - see LICENSE file for details.

Acknowledgments

Inspired by dbt's declarative approach to data pipelines and built on top of Arbiter for evaluation.

About

Declarative orchestration framework for AI pipelines with built-in evaluation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%