This document provides a comprehensive catalog of ALL DSPy components that are missing or incomplete in DSPEx. It serves as the definitive reference for understanding the scope of work needed to achieve feature parity with the Python DSPy library.
- β Program Selection Algorithm: Implemented with performance-based scoring
- β
Program Pool Management: Complete with
top_k_plus_baseline()
logic - β
Score Calculation: Robust
calc_average_score()
with validation β οΈ Main Loop Integration: Core algorithm complete, final testing in progress
Impact: SIMBA optimizer now functional with advanced validation Priority: LOW - Final integration testing and optimization
- β BootstrapFewShot (Complete)
β οΈ BEACON (Infrastructure only, missing Bayesian optimization)- β SIMBA (Complete with Elixact validation - final testing)
Status: β Missing entirely
DSPy File: dspy/teleprompt/mipro_optimizer_v2.py
Features:
- Multi-step instruction optimization
- Automatic prompt proposal and refinement
- Bayesian optimization for hyperparameters
- Support for multiple signature optimization
- Meta-learning components
Status: β Missing entirely
DSPy File: dspy/teleprompt/copro_optimizer.py
Features:
- Curriculum learning approach
- Progressive difficulty increase
- Adaptive example selection
- Multi-stage optimization
Status: β Missing entirely
DSPy File: dspy/teleprompt/ensemble.py
Features:
- Multiple model combination
- Voting mechanisms
- Confidence-weighted averaging
- Diversity-based selection
Status: β Missing entirely
DSPy File: dspy/teleprompt/bootstrap_finetune.py
Features:
- Model fine-tuning on generated data
- Bootstrap data generation
- Training pipeline integration
- Model adapter support
Status: β Missing entirely
DSPy File: dspy/teleprompt/random_search.py
Features:
- Random search optimization
- Hyperparameter exploration
- Bootstrap with random sampling
Status: β Missing entirely
DSPy File: dspy/teleprompt/teleprompt_optuna.py
Features:
- Optuna-based hyperparameter optimization
- Advanced search strategies
- Multi-objective optimization
Status: β Missing entirely
DSPy File: dspy/teleprompt/bettertogether.py
Features:
- Multi-agent collaboration
- Program composition optimization
- Joint training strategies
Status: β Missing entirely
DSPy File: dspy/teleprompt/avatar_optimizer.py
Features:
- Avatar-based optimization
- Persona-driven prompting
- Character consistency
Status: β Missing entirely
DSPy File: dspy/teleprompt/infer_rules.py
Features:
- Automatic rule inference
- Pattern extraction from examples
- Rule-based program enhancement
Status: β Missing entirely
DSPy File: dspy/teleprompt/vanilla.py
Features:
- Simple labeled few-shot learning
- No optimization, just demonstration
- Baseline comparison method
- β Predict (Complete)
- β PredictStructured (Enhanced with Elixact validation)
- β ChainOfThought (Complete with Elixact step validation)
- β ReAct (Complete with Elixact action validation)
Status: β Missing entirely
DSPy File: dspy/predict/chain_of_thought_with_hint.py
Features:
- CoT with external hints
- Guided reasoning
- Hint integration
Status: β Missing entirely
DSPy File: dspy/predict/program_of_thought.py
Features:
- Code generation capabilities
- Code execution environment
- Result integration
- Mathematical reasoning
Status: β Missing entirely
DSPy File: dspy/predict/multi_chain_comparison.py
Features:
- Multiple reasoning chains
- Comparison mechanisms
- Best chain selection
- Confidence scoring
Status: β Missing entirely
DSPy File: dspy/predict/best_of_n.py
Features:
- Multiple generation sampling
- Best response selection
- Quality-based filtering
Status: β Missing entirely
DSPy File: dspy/predict/refine.py
Features:
- Iterative refinement
- Draft-and-revise pattern
- Quality improvement loops
Status: β Missing entirely
DSPy File: dspy/predict/retry.py
Features:
- Automatic retry on failure
- Backoff strategies
- Error-specific retry logic
- Max attempt limits
Status: dspy/predict/parallel.py
Features:
- Standardized parallel execution
- Result aggregation patterns
- Error handling across tasks
Status: β Missing entirely
DSPy File: dspy/predict/aggregation.py
Features:
- Response aggregation strategies
- Majority voting
- Weighted combinations
Status: β Missing entirely
DSPy File: dspy/predict/code_act.py
Features:
- Code-based action execution
- Programming environment integration
- Interactive code generation
Status: β Missing entirely
DSPy File: dspy/predict/knn.py
Features:
- K-nearest neighbor retrieval
- Example-based prediction
- Similarity-based reasoning
Status: β Missing entirely
DSPy File: dspy/predict/parameter.py
Features:
- Parameter management
- Learnable parameters
- Optimization state tracking
Status: β COMPLETELY MISSING - This is the largest functional gap
- β DSPEx.Retrieve - Base retrieval behavior
- β DSPEx.Retrieve.Embeddings - Basic embeddings retrieval
- β ChromaDB (
dspy/retrieve/chromadb_rm.py
) - β Pinecone (
dspy/retrieve/pinecone_rm.py
) - β Weaviate (
dspy/retrieve/weaviate_rm.py
) - β Qdrant (
dspy/retrieve/qdrant_rm.py
) - β Milvus (
dspy/retrieve/milvus_rm.py
) - β FAISS (
dspy/retrieve/faiss_rm.py
) - β LanceDB (
dspy/retrieve/lancedb_rm.py
) - β Deeplake (
dspy/retrieve/deeplake_rm.py
) - β Epsilla (
dspy/retrieve/epsilla_rm.py
) - β MyScale (
dspy/retrieve/my_scale_rm.py
) - β MongoDB Atlas (
dspy/retrieve/mongodb_atlas_rm.py
) - β PGVector (
dspy/retrieve/pgvector_rm.py
) - β Neo4j (
dspy/retrieve/neo4j_rm.py
) - β FalkorDB (
dspy/retrieve/falkordb_rm.py
)
- β ColBERTv2 (
dspy/dsp/colbertv2.py
) - Dense retrieval with late interaction - β Azure AI Search (
dspy/retrieve/azureaisearch_rm.py
) - β Databricks (
dspy/retrieve/databricks_rm.py
) - β Clarifai (
dspy/retrieve/clarifai_rm.py
) - β Marqo (
dspy/retrieve/marqo_rm.py
) - β Snowflake (
dspy/retrieve/snowflake_rm.py
) - β Vectara (
dspy/retrieve/vectara_rm.py
) - β Watson Discovery (
dspy/retrieve/watson_discovery_rm.py
) - β You.com (
dspy/retrieve/you_rm.py
)
- β LlamaIndex Integration (
dspy/retrieve/llama_index_rm.py
) - β RAGatouille (
dspy/retrieve/ragatouille_rm.py
) - β Hybrid Search capabilities
- β Reranking mechanisms
β οΈ Basic evaluation framework (limited)
DSPy File: dspy/evaluate/metrics.py
- β Answer Exact Match improvements
- β Answer Passage Match
- β Semantic F1 Score
- β BLEU Score
- β ROUGE Score
- β BERTScore
DSPy File: dspy/evaluate/evaluate.py
- β Multi-threaded evaluation
- β Progress display
- β Result tables
- β Statistical analysis
- β Error breakdown
DSPy File: dspy/evaluate/auto_evaluation.py
- β LLM-based evaluation
- β Reference-free metrics
- β Quality assessment
- β CompleteAndGrounded
- β Faithfulness metrics
- β Relevance scoring
- β Hallucination detection
Status: β COMPLETELY MISSING - Core DSPy feature
DSPy File: dspy/primitives/assertions.py
- β dspy.Assert() - Hard constraints with retry
- β dspy.Suggest() - Soft hints for improvement
- β Context management - Assertion integration
- β Backtracking - Retry with constraints
- β Constraint satisfaction - Runtime validation
- β ChatAdapter (Basic)
- β JSONAdapter (Basic)
- β TwoStepAdapter (
dspy/adapters/two_step_adapter.py
) - β Adapter utilities (
dspy/adapters/utils.py
)
- β Image types (
dspy/adapters/types/image.py
) - β Audio types (
dspy/adapters/types/audio.py
) - β Tool types (
dspy/adapters/types/tool.py
) - β History types (
dspy/adapters/types/history.py
)
- β Example (Complete)
- β Module (Complete)
- β Program (Complete)
- β Prediction (Complete)
DSPy File: dspy/primitives/python_interpreter.py
- β Safe code execution
- β Sandbox environment
- β Result handling
- β Module composition patterns
- β Parameter management
- β State serialization
- β Module introspection
- β Basic Client (Limited providers)
- β ClientManager (Good)
- β OpenAI integration (Basic)
DSPy integrates with 100+ models via LiteLLM
- β Anthropic Claude
- β Google Gemini (partial)
- β Cohere
- β Hugging Face
- β Azure OpenAI
- β AWS Bedrock
- β Google Vertex AI
- β Local model support
- β Ollama integration
- β vLLM integration
- β Together AI
- β Anyscale
- β Groq
- β Fireworks AI
- β And 50+ more providers
- β OpenAI embeddings
- β Cohere embeddings
- β Hugging Face embeddings
- β Local embedding models
- β Embedding caching
- β Rate limiting (has stub)
- β Circuit breakers (noted as bypassed)
- β Advanced caching
- β Request retries
- β Model fallbacks
- β Basic caching
- β Basic logging
- β Streaming support (
dspy/streaming/
) - β Asyncify utilities (
dspy/utils/asyncify.py
) - β Usage tracking (
dspy/utils/usage_tracker.py
) - β Saving/loading (
dspy/utils/saving.py
) - β History inspection (
dspy/utils/inspect_history.py
) - β Parallelizer (
dspy/utils/parallelizer.py
) - β Unbatchify (
dspy/utils/unbatchify.py
) - β Exception handling (
dspy/utils/exceptions.py
)
- β LangChain tools (
dspy/utils/langchain_tool.py
) - β MCP (Model Context Protocol) (
dspy/utils/mcp.py
) - β Python interpreter (
dspy/primitives/python_interpreter.py
)
- β Advanced telemetry
- β Distributed tracing
- β Performance monitoring
- β Error analytics
Status: β No dataset utilities
DSPy File: dspy/datasets/
- β GSM8K (
dspy/datasets/gsm8k.py
) - β HotpotQA (
dspy/datasets/hotpotqa.py
) - β Math datasets (
dspy/datasets/math.py
) - β Colors dataset (
dspy/datasets/colors.py
)
- β DataLoader (
dspy/datasets/dataloader.py
) - β Dataset utilities (
dspy/datasets/dataset.py
) - β Example loading patterns
- β ALFWorld (
dspy/datasets/alfworld/
) - β Custom dataset loaders
Status: β No experimental features
DSPy File: dspy/experimental/
- β Module graph analysis (
module_graph.py
) - β Synthetic data generation (
synthetic_data.py
) - β Synthesizer framework (
synthesizer/
)
Category | Total Components | Implemented | Missing | Completion % |
---|---|---|---|---|
Teleprompters | 10 | 1 (partial) | 9 | 10% |
Predict Modules | 15 | 2 | 13 | 13% |
Retrieval System | 25 | 0 | 25 | 0% |
Evaluation | 10 | 1 (partial) | 9 | 10% |
Assertions | 5 | 0 | 5 | 0% |
Adapters/Types | 8 | 2 | 6 | 25% |
Primitives | 8 | 4 | 4 | 50% |
Client/Models | 20 | 3 | 17 | 15% |
Tools/Utilities | 15 | 2 | 13 | 13% |
Datasets | 10 | 0 | 10 | 0% |
Experimental | 5 | 0 | 5 | 0% |
TOTAL | 131 | 15 | 116 | 11% |
- β Program selection algorithm
- β Program pool management
- β Score calculation logic
- β Main loop integration
- β ChainOfThought module
- β Basic retrieval system
- β Assertions framework
- β ReAct module
- β Vector database integrations (ChromaDB, Pinecone)
- β Advanced evaluation metrics
- β Additional teleprompters (MIPROv2)
- β More predict modules
- β Advanced caching and observability
- β Streaming support
- β Additional model providers
- β Dataset utilities
- SIMBA Works: Fix blocking algorithmic issues
- RAG Capability: End-to-end retrieval-augmented generation
- Advanced Reasoning: ChainOfThought, ReAct, MultiChain
- Production Ready: Robust error handling, monitoring, caching
- Ecosystem Parity: 80%+ component coverage compared to DSPy
Current Status: 11% component parity Target Status: 80%+ component parity
This master list shows DSPEx has a solid foundation (excellent infrastructure) but needs substantial work to match DSPy's comprehensive ecosystem. The most critical path is fixing SIMBA's algorithmic issues, then building core reasoning modules and the retrieval system.
A BEAM-Native AI Program Optimization Framework
DSPEx is a sophisticated Elixir port of DSPy (Declarative Self-improving Python), reimagined for the BEAM virtual machine. Rather than being a mere transliteration, DSPEx leverages Elixir's unique strengths in concurrency, fault tolerance, and distributed systems to create a more robust and scalable framework for programming language models.
DSPEx provides three distinct test modes to accommodate different development and integration scenarios:
mix test # Default behavior
mix test.mock # Explicit pure mock
mix test.mock test/unit/ # Run specific test directory
Behavior:
- No network requests made
- Fast, deterministic execution
- Uses contextual mock responses
- Perfect for unit testing and CI/CD
When to use: Daily development, unit tests, CI pipelines
mix test.fallback # All tests with fallback
mix test.fallback test/integration/ # Integration tests with fallback
DSPEX_TEST_MODE=fallback mix test # Environment variable approach
Behavior:
- Attempts real API calls when API keys available
- Seamlessly falls back to mock when no keys present
- Tests work regardless of API key availability
- Validates both integration and mock logic
When to use: Development with optional API access, integration testing
mix test.live # Requires API keys for all providers
mix test.live test/integration/ # Live integration testing only
DSPEX_TEST_MODE=live mix test # Environment variable approach
Behavior:
- Requires valid API keys
- Tests fail if API keys missing
- Real network requests to live APIs
- Validates actual API integration and error handling
When to use: Pre-deployment validation, debugging API issues, performance testing
Why MIX_ENV=test?
The test environment ensures proper isolation and test-specific configurations. Our mix tasks automatically set MIX_ENV=test
via preferred_cli_env
in mix.exs
, so you don't need to set it manually.
API Key Setup (Optional for fallback/live modes):
export GEMINI_API_KEY=your_gemini_key
export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
Override Test Mode:
export DSPEX_TEST_MODE=mock # Force pure mock
export DSPEX_TEST_MODE=fallback # Force fallback mode
export DSPEX_TEST_MODE=live # Force live mode
Best Practices:
- Use pure mock for daily development and CI/CD
- Use fallback mode for integration development
- Use live mode before production deployments and for debugging real API issues
- Keep API keys in
.env
files or secure environment management
π For detailed testing strategy and migration guidelines, see LIVE_DIVERGENCE.md which covers the strategic approach to live API integration and test architecture patterns.
DSPEx uses a test mode system that defaults to pure mock mode for development safety. To use live APIs, you must explicitly enable live API mode using the DSPEX_TEST_MODE
environment variable.
Test Mode Configuration:
# Default: Pure mock mode (no API calls)
mix run my_script.exs # Uses mocks only
# Enable live API with fallback to mocks
DSPEX_TEST_MODE=fallback mix run my_script.exs # Tries live API, falls back to mocks
# Require live API (fail if no keys)
DSPEX_TEST_MODE=live mix run my_script.exs # Live API only, fails without keys
Supported Providers:
export GEMINI_API_KEY=your_gemini_key # Google Gemini (recommended)
export OPENAI_API_KEY=your_openai_key # OpenAI GPT models
export ANTHROPIC_API_KEY=your_anthropic_key # Anthropic Claude (future)
API Key Detection Logic:
- In mock mode: Always uses mock responses (default)
- In fallback mode: Uses live API if keys available, otherwise falls back to mocks
- In live mode: Requires API keys, fails if missing
Basic Usage with Live API:
# Set your API key and enable live API mode
export GEMINI_API_KEY=your_actual_gemini_key
export DSPEX_TEST_MODE=fallback
# Create and run a program - now uses live API
program = DSPEx.Predict.new(MySignature, :gemini)
{:ok, result} = DSPEx.Program.forward(program, %{question: "What is Elixir?"})
# Returns real AI-generated response from Gemini
BEACON (Bayesian Exploration and Adaptive Compilation Of Narratives) Optimization with Live API:
# Set live API mode first
export GEMINI_API_KEY=your_key
export DSPEX_TEST_MODE=fallback
# Both student and teacher use live APIs
student = DSPEx.Predict.new(QASignature, :gemini)
teacher = DSPEx.Predict.new(QASignature, :gemini)
# BEACON optimization makes 100+ real API calls
{:ok, optimized} = DSPEx.Teleprompter.BEACON.compile(
student, teacher, training_examples, metric_fn
)
DSPEx includes a comprehensive demo application showcasing BEACON with live APIs:
# Navigate to the demo
cd examples/dspex_demo
# Install dependencies
mix deps.get
# Enable live API mode with your Gemini API key
export GEMINI_API_KEY=your_key
export DSPEX_TEST_MODE=fallback
# Run demos with live API
./demo qa # Question answering with BEACON
./demo sentiment # Sentiment analysis optimization
./demo cot # Chain-of-thought reasoning
./demo --interactive # Interactive Q&A session
# Run all demos with live API
./demo # Complete BEACON showcase
What the Demo Shows with Live API:
- Real API Request Logs:
[LIVE API REQUEST] gemini | predict-...
showing actual calls - Authentic Responses: Real AI responses, not mock data
- BEACON Optimization: Dozens of concurrent API calls during optimization
- Performance: Real-world latency and response characteristics
Mode Requirements:
- Default behavior: DSPEx uses mock mode by default for safety
- Must set DSPEX_TEST_MODE:
fallback
orlive
to enable live APIs - Cost awareness: Live mode makes many real API calls that cost money
Cost Considerations:
- Live API calls incur costs from your provider account
- BEACON optimization typically makes 50-200+ API calls during optimization
- Monitor your usage through your provider's dashboard
- Consider using
fallback
mode during development to limit costs
Test Mode Details:
- mock: Pure mock, no network (default, safe for development)
- fallback: Live API preferred, graceful mock fallback (recommended for testing)
- live: Live API required, fails without keys (for production validation)
Development Workflow:
# Daily development - fast and free
mix run my_app.exs # Mock mode
# Testing with real APIs - costs money but validates integration
DSPEX_TEST_MODE=fallback mix run my_app.exs # Live API with fallback
# Production validation - strict live API testing
DSPEX_TEST_MODE=live mix run my_app.exs # Live API only
API Key Security:
- Never commit API keys to version control
- Use
.env
files or secure environment management - Rotate keys regularly and monitor usage for anomalies
DSPEx's test architecture has been optimized for maximum developer productivity:
Performance Results:
- Full test suite: < 7 seconds in mock mode
- 400x performance improvement: Tests now run consistently fast regardless of network conditions
- Zero flakiness: Deterministic mock responses ensure reliable CI/CD
Fault Tolerance Testing:
- Process supervision: Tests validate GenServer lifecycle and crash recovery
- Network resilience: Proper handling of dead processes and API failures
- Environment isolation: Prevention of test contamination between runs
Test Architecture Features:
- Three-mode system: Mock, Fallback, and Live modes for different scenarios
- Intelligent fallback: Live API attempts with seamless mock fallback
- Performance isolation: Timing tests use controlled mock conditions
- Process management: Proper GenServer lifecycle handling in supervision tests
DSPEx is not a general-purpose agent-building toolkit; it is a specialized compiler that uses data and metrics to systematically optimize Language Model (LLM) programs. While interacting with LLMs is becoming easier, achieving consistently high performance remains a manual, unscientific process of "prompt tweaking." DSPEx automates the discovery of optimal prompting strategies, treating prompts as optimizable artifacts rather than static strings.
The primary bottleneck in prompt optimization is evaluating programs against large validation sets. DSPEx leverages Task.async_stream
to achieve I/O-bound concurrency that fundamentally outperforms thread-based solutions:
# Evaluate a program on 10,000 examples with true parallelism
scores = DSPEx.Evaluate.run(my_program, dev_set, &MyMetric.calculate/2,
max_concurrency: 1000)
Performance Advantage: Where Python DSPy is limited by thread overhead, DSPEx can spawn hundreds of thousands of lightweight BEAM processes, each handling an LLM API call independently.
Optimization jobs are long-running and vulnerable to transient network errors. DSPEx builds on OTP principles where a single failed evaluation crashes its own isolated process without halting the entire optimization job:
# If one API call fails, it doesn't crash the entire evaluation
# The supervisor handles retry strategies automatically
evaluation_results = DSPEx.Evaluate.run(program, large_dataset, metric,
restart: :temporary,
max_restarts: 3)
Every step of execution and optimization is instrumented using :telemetry
, providing deep insights into performance, cost, and behavior patterns in production.
DSPEx follows a layered dependency graph optimized for the BEAM:
DSPEx.Signature (Foundation - Compile-time contracts)
β
DSPEx.Adapter (Translation Layer - Runtime formatting)
β
DSPEx.Client (HTTP/LLM Interface - Resilient GenServer)
β
DSPEx.Program/Predict (Execution Engine - Process orchestration)
β
DSPEx.Evaluate & Teleprompter (Optimization Layer - Concurrent optimization)
Unlike Python's runtime signature validation, DSPEx uses Elixir macros for compile-time safety:
defmodule QASignature do
@moduledoc "Answer questions with detailed reasoning and confidence"
use DSPEx.Signature, "question, context -> answer, reasoning, confidence"
end
# Generates at compile time:
# - Input/output field validation
# - Struct definition with @type specs
# - Behaviour implementation for adapters
# - Introspection functions for optimization
BEAM Advantage: Compile-time expansion catches signature errors before deployment, while Python DSPy validates at runtime.
The HTTP client is implemented as a supervised GenServer with production-grade resilience:
defmodule DSPEx.Client do
use GenServer
# Features:
# - Circuit breaker pattern (planned)
# - Automatic caching (planned)
# - Rate limiting and exponential backoff (planned)
# - Connection pooling via Finch
# - Distributed state management (planned)
def request(prompt, opts \\ []) do
# Current implementation uses functional approach
# GenServer-based architecture planned for Phase 2B
DSPEx.Client.request(prompt, opts)
end
end
Current Status: HTTP client with error categorization and multi-provider support. GenServer architecture with supervision planned for Phase 2B.
Adapters handle the translation between high-level signatures and provider-specific formats:
defmodule DSPEx.Adapter.Chat do
@behaviour DSPEx.Adapter
@impl true
def format(signature, inputs, demos) do
# Convert signature + demos into OpenAI chat format
messages = [
%{role: "system", content: signature.instructions},
# Format few-shot demonstrations
Enum.flat_map(demos, &format_demo/1),
# Format current input
%{role: "user", content: format_input(signature, inputs)}
]
{:ok, messages}
end
@impl true
def parse(signature, response) do
# Extract structured outputs from response
# Handle field validation and type coercion
end
end
Programs implement a behavior that enables composition and optimization:
defmodule DSPEx.Predict do
@behaviour DSPEx.Program
defstruct [:signature, :client, :adapter, demos: []]
@impl true
def forward(%__MODULE__{} = program, inputs, opts) do
with {:ok, messages} <- program.adapter.format(program.signature, inputs, program.demos),
{:ok, response} <- program.client.request(messages, opts),
{:ok, outputs} <- program.adapter.parse(program.signature, response) do
{:ok, %DSPEx.Prediction{inputs: inputs, outputs: outputs}}
end
end
end
Process Isolation: Each forward/3
call can run in its own process, providing natural parallelism and fault isolation.
The evaluation engine leverages BEAM's process model for massive parallelism:
defmodule DSPEx.Evaluate do
def run(program, examples, metric_fn, opts \\ []) do
max_concurrency = Keyword.get(opts, :max_concurrency, 100)
examples
|> Task.async_stream(
fn example ->
with {:ok, prediction} <- DSPEx.Program.forward(program, example.inputs),
score when is_number(score) <- metric_fn.(example, prediction) do
{:ok, score}
end
end,
max_concurrency: max_concurrency,
timeout: :infinity
)
|> Enum.reduce({0, 0}, fn
{:ok, {:ok, score}}, {sum, count} -> {sum + score, count + 1}
_, acc -> acc
end)
|> then(fn {sum, count} -> sum / count end)
end
end
Concurrency Advantage: While Python DSPy uses thread pools limited by GIL and OS constraints, DSPEx can easily handle 10,000+ concurrent evaluations on a single machine.
Teleprompters (optimizers) implement sophisticated few-shot learning and program optimization:
defmodule DSPEx.Teleprompter.BootstrapFewShot do
@behaviour DSPEx.Teleprompter
@impl true
def compile(student, teacher, trainset, metric_fn, opts \\ []) do
# Bootstrap examples by running teacher on trainset
bootstrapped_demos =
trainset
|> Task.async_stream(fn example ->
with {:ok, prediction} <- DSPEx.Program.forward(teacher, example.inputs),
score when score > 0.7 <- metric_fn.(example, prediction) do
{:ok, %DSPEx.Example{inputs: example.inputs, outputs: prediction.outputs}}
else
_ -> {:skip}
end
end, max_concurrency: 50)
|> Stream.filter(fn {:ok, result} -> result != {:skip} end)
|> Stream.map(fn {:ok, {:ok, demo}} -> demo end)
|> Enum.take(Keyword.get(opts, :max_demos, 16))
# Create optimized student with bootstrapped demos
optimized_student = DSPEx.OptimizedProgram.new(student, bootstrapped_demos, %{
teleprompter: :bootstrap_fewshot,
optimization_time: DateTime.utc_now()
})
{:ok, optimized_student}
end
end
DSPEx leverages best-in-class Elixir libraries:
Component | Library | Status | Rationale |
---|---|---|---|
HTTP Client | Req + Finch |
β Complete | Modern, composable HTTP with connection pooling |
Schema Validation | Elixact |
β Complete | World-class validation with intelligent LLM output repair |
Circuit Breaker | Fuse |
π Planned | Battle-tested circuit breaker implementation |
Caching | Cachex |
π Planned | High-performance in-memory caching with TTL |
JSON | Jason |
β Complete | Fast JSON encoding/decoding |
Testing | Mox + PropCheck |
β Complete | Mocking and property-based testing |
Observability | :telemetry |
β Complete | Built-in instrumentation and metrics |
Phase 1 - Foundation (COMPLETE):
- β DSPEx.Signature - Complete compile-time parsing with macro expansion and field validation
- β DSPEx.Example - Immutable data structures with Protocol implementations
- β DSPEx.Client - HTTP client with error categorization and multi-provider support
- β DSPEx.Adapter - Message formatting and response parsing for multiple providers
- β DSPEx.Program - Behavior interface with telemetry integration
- β DSPEx.Predict - Core prediction orchestration with Foundation integration
- β DSPEx.Evaluate - Concurrent evaluation engine using Task.async_stream
Phase 2A - Core Optimization (COMPLETE):
- β DSPEx.Teleprompter - Behavior definition for optimization algorithms
- β DSPEx.Teleprompter.BootstrapFewShot - Complete single-node optimization implementation
- β DSPEx.Teleprompter.SIMBA - Advanced optimization with Elixact validation
- β DSPEx.OptimizedProgram - Container for programs enhanced with demonstrations
Phase 2B - Elixact Integration (COMPLETE):
- β DSPEx.Signature.TypedSignature - Enhanced signatures with Elixact validation
- β DSPEx.Predict.ChainOfThought - Step-by-step reasoning with validation
- β DSPEx.Predict.ReAct - Reasoning + Acting with action validation
- β Intelligent Output Repair - Automatic LLM output correction and validation
Current Working Features:
- β End-to-end pipeline: Create programs, execute predictions, evaluate performance
- β Program optimization: BootstrapFewShot and SIMBA teleprompters with validation
- β Advanced reasoning: ChainOfThought and ReAct with step-by-step validation
- β Intelligent validation: Elixact-powered schema validation with output repair
- β Concurrent evaluation: High-performance evaluation with fault isolation
- β Foundation integration: Comprehensive telemetry, correlation tracking, and observability
- β Multi-provider support: OpenAI, Anthropic, Gemini adapters working
- β Production testing: Three-mode test architecture (mock/fallback/live)
Phase 2C - Enhanced Infrastructure:
- GenServer-based client architecture with supervision
- Circuit breakers and advanced error handling with Fuse
- Response caching with Cachex
- Rate limiting and connection pooling
Phase 2D - Advanced Programs:
- MultiChainComparison optimization
- Parallel execution patterns
- BestOfN sampling strategies
- Retry mechanisms with backoff
Phase 3 - Enterprise Features:
- Distributed optimization across BEAM clusters
- Phoenix LiveView optimization dashboard
- Advanced metrics and cost tracking
- Integration with vector databases for RAG
Every component runs in supervised processes. A malformed LLM response or network timeout affects only that specific evaluation, not the entire optimization run.
Update optimization algorithms or add new adapters without stopping running evaluations - a critical advantage for long-running optimization jobs.
Scale optimization across multiple BEAM nodes with minimal code changes:
# Future: Distribute evaluation across cluster nodes
DSPEx.Evaluate.run_distributed(program, large_dataset, metric,
nodes: [:node1@host, :node2@host])
BEAM's copying garbage collector and process isolation prevent memory leaks common in long-running Python optimization jobs.
:telemetry
events provide deep insights without external monitoring infrastructure:
# Automatic metrics for every LLM call
:telemetry.attach("dspex-metrics", [:dspex, :program, :forward, :stop],
&MyApp.Metrics.handle_event/4)
Based on architectural analysis, BEAM characteristics, and recent optimizations:
Scenario | Python DSPy | DSPEx Current | Notes |
---|---|---|---|
10K evaluations | ~30 minutes (thread-limited) | ~5 minutes (process-limited by API) | Theoretical based on concurrency model |
Test suite execution | Variable (network dependent) | < 7 seconds (400x improvement) | Measured with mock mode |
Fault recovery | Manual restart required | Automatic supervision recovery | OTP supervision trees |
Memory usage | Grows with dataset size | Constant per process | BEAM process isolation |
Monitoring | External tools required | Built-in telemetry | Native :telemetry integration |
Distribution | Complex setup | Native BEAM clustering (planned) | Future distributed evaluation |
Recent Performance Optimizations:
- Testing architecture: 400x performance improvement through intelligent mock/live switching
- Process management: Robust supervision testing with proper GenServer lifecycle handling
- Zero contamination: Clean test environment management prevents state leakage
- Network isolation: Performance tests isolated from network conditions for consistent results
DSPEx excels in scenarios that leverage BEAM's strengths:
Building systems that make thousands of concurrent calls to LLM APIs, vector databases, and other web services.
Applications requiring 99.9% uptime where individual component failures shouldn't crash the entire system.
Systems that need to automatically discover optimal prompting strategies through data-driven optimization.
Systems requiring sub-second response times with automatic failover and circuit breaking.
Add dspex
to your list of dependencies in mix.exs
:
def deps do
[
{:dspex, "~> 0.1.0"},
# Required dependencies
{:req, "~> 0.4.0"},
{:jason, "~> 1.4"},
# Future dependencies
{:fuse, "~> 2.4"}, # For circuit breakers (Phase 2B)
{:cachex, "~> 3.6"}, # For caching (Phase 2B)
# Optional for testing
{:mox, "~> 1.0", only: :test}
]
end
# 1. Define a signature
defmodule QASignature do
use DSPEx.Signature, "question -> answer"
end
# 2. Create a program
program = DSPEx.Predict.new(QASignature, :gemini)
# 3. Run predictions
{:ok, outputs} = DSPEx.Program.forward(program, %{question: "What is Elixir?"})
# 4. Evaluate performance
examples = [
%DSPEx.Example{
data: %{question: "What is OTP?", answer: "Open Telecom Platform"},
input_keys: MapSet.new([:question])
}
]
metric_fn = fn example, prediction ->
if DSPEx.Example.get(example, :answer) == Map.get(prediction, :answer), do: 1.0, else: 0.0
end
{:ok, result} = DSPEx.Evaluate.run(program, examples, metric_fn)
# 5. Optimize with teleprompter
teacher = DSPEx.Predict.new(QASignature, :openai) # Use stronger model as teacher
{:ok, optimized} = DSPEx.Teleprompter.BootstrapFewShot.compile(
program, # student
teacher, # teacher
examples, # training set
metric_fn # metric function
)
- Implementation Status:
CLAUDE.md
- Current status and critical gap analysis - Testing Strategy:
LIVE_DIVERGENCE.md
- Comprehensive test architecture - Elixact Integration:
ELIXACT_DSPEX_INTEGRATION_GUIDE.md
- Complete integration guide - Predict Modules:
ELIXACT_DSPEX_PREDICT_GUIDE.md
- ChainOfThought and ReAct implementation - Signature System:
ELIXACT_DSPEX_SIGNATURES_GUIDE.md
- Advanced signature validation - SIMBA Teleprompter:
ELIXACT_DSPEX_SIMBA_GUIDE.md
- Validated optimization strategies - Architecture Deep Dive:
docs/001_initial/101_claude.md
- Implementation Plan:
docs/005_optimizer/100_claude.md
- Staged Development:
docs/005_optimizer/102_CLAUDE_STAGED_IMPL.md
- Critical Assessment:
docs/001_initial/28_gemini_criticalValueAssessment.md
- Foundation Integration:
docs/001_initial/104_claude_synthesizeGemini_foundationIntegrationGuide.md
DSPEx follows a rigorous test-driven development approach with comprehensive coverage across unit, integration, property-based, and concurrent testing. The project prioritizes correctness, observability, and BEAM-native patterns.
Current Test Coverage: 85%+ across all core modules with zero Dialyzer warnings maintained.
Same as original DSPy project.
- Stanford DSPy Team: For the foundational concepts and research
- Elixir Community: For the excellent ecosystem packages
- BEAM Team: For the robust runtime platform that makes this vision possible
Current Status: DSPEx has achieved its core vision with a working end-to-end pipeline including automated program optimization through teleprompters. With Elixact integration complete, DSPEx now provides world-class validation, intelligent output repair, and advanced reasoning capabilities. The foundation is solid for enterprise features like distributed optimization and production tooling.