Loom

Orchestration framework for AI pipelines with built-in evaluation

⚠️ Alpha Software: Early development stage. Use for evaluation and experimentation.

What is Loom?

Loom is the "dbt for AI(E)TL" - a declarative orchestration framework for AI pipelines.

Traditional ETL becomes AI(E)TL: Extract → Transform → Evaluate → Load

Declarative YAML pipelines with built-in quality gates ensure your AI outputs meet quality thresholds before reaching production.

Core Value: Production-grade AI pipeline orchestration without complexity, vendor lock-in, or hidden evaluation gaps.

Status: Alpha software (v0.1.0-alpha). Functional but early-stage. Best suited for evaluation, experimentation, and development.

Why Loom?

The Problem: Building production AI pipelines requires orchestration AND evaluation. Existing tools do one or the other, not both.

What Loom Provides:

Declarative Pipelines: Define AI workflows as version-controlled YAML
Built-in Evaluation: Quality gates using Arbiter prevent bad outputs from reaching production
Provider-Agnostic: Works with OpenAI, Anthropic, Google, Groq - no vendor lock-in
Production-Ready: Circuit breakers, retry logic, timeout enforcement

Use Case Example: A sentiment analysis pipeline needs quality assurance. Loom provides:

Declarative YAML pipeline definition (Extract → Transform → Evaluate → Load)
Automatic evaluation with configurable quality gates
Quarantine pattern for failed records
Complete audit trail of transformations and evaluations

Quick Example

# pipelines/customer_sentiment.yaml
name: customer_sentiment
version: 2.1.0

extract:
  source: postgres://customers/reviews

transform:
  prompt: prompts/classify_sentiment.txt
  model: gpt-4o-mini
  batch_size: 50

evaluate:
  evaluators:
    - type: semantic
      threshold: 0.8
    - type: custom_criteria
      criteria: "Accurate, no hallucination"
      threshold: 0.75
  quality_gate: all_pass

load:
  destination: postgres://analytics/sentiment_scores

Run it:

loom run customer_sentiment

Key Features

✅ Declarative Pipelines: YAML-based pipeline definitions (Extract, Transform, Evaluate, Load)
✅ Built-in Evaluation: Arbiter integration with quality gates (all_pass, majority_pass, any_pass, weighted)
✅ Provider-Agnostic LLMs: OpenAI, Anthropic, Google, Groq support
✅ Multiple Data Formats: CSV, JSON, JSONL, Parquet support
✅ Quality Gates: Four gate types with precise mathematical definitions
✅ Circuit Breaker Pattern: Production resilience for LLM calls
✅ Quarantine Pattern: Failed records logged with failure reasons for investigation
✅ CLI Interface: loom run, loom validate commands

Getting Started

# Clone the repository
git clone https://github.com/evanvolgas/loom.git
cd loom

# Install dependencies
uv venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
uv pip install -e ".[dev]"

# Run tests
pytest

Relationship to Arbiter

Loom uses Arbiter as its evaluation engine:

Arbiter: Evaluates individual LLM outputs (what)
Loom: Orchestrates pipelines with evaluation gates (when/how)

Separate projects, complementary goals.

Roadmap

Note: This is a personal project. Roadmap items are ideas and explorations, not commitments. Priorities and timelines may change based on what's useful.

Phase 1 - Foundation ✅ (Completed)

Core pipeline engine (Extract, Transform, Evaluate, Load)
YAML pipeline parser
Arbiter integration with quality gates
Circuit breaker and resilience patterns
Basic CLI (loom run, loom validate)

Future Ideas (No timeline, exploring as needed)

Database connectors (PostgreSQL, MySQL)
Cost tracking and monitoring
Semantic caching for duplicate inputs
Smart retry logic with failure-type awareness
Testing framework for pipelines
More advanced monitoring and alerting

Contributions welcome! This is a personal project, but if you find it useful and want to contribute, pull requests are appreciated.

License

MIT License - see LICENSE file for details.

Acknowledgments

Inspired by dbt's declarative approach to data pipelines and built on top of Arbiter for evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
docs		docs
examples		examples
loom		loom
prompts		prompts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CURSOR.md		CURSOR.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loom

What is Loom?

Why Loom?

Quick Example

Key Features

Getting Started

Relationship to Arbiter

Roadmap

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

codeflash-ai/loom

Folders and files

Latest commit

History

Repository files navigation

Loom

What is Loom?

Why Loom?

Quick Example

Key Features

Getting Started

Relationship to Arbiter

Roadmap

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages