An agentic ML system that applies process mining to CI/CD build data, identifies bottlenecks and failure patterns, and generates actionable insights using LLM-powered natural language generation.
Live Demo: devflow-analyzer.streamlit.app
DevFlow Analyzer takes CI/CD event logs (build history), performs process mining analysis, and generates comprehensive reports explaining:
- Build Health - Overall success rates and trends
- Bottlenecks - Slow builds and performance issues
- Failure Patterns - Which projects fail most and why
- Recommendations - Actionable steps to improve CI/CD performance
┌─────────────────────┐
│ CI/CD Build Logs │ (TravisTorrent, CSV)
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Process Analyzer │ (PM4Py)
│ - Load & validate │
│ - Compute metrics │
│ - Identify issues │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Structured Metrics │ (BuildAnalysisResult)
│ - Success rates │
│ - Duration stats │
│ - Bottlenecks │
└──────────┬──────────┘
│
┌─────┴─────┐
│ │
▼ ▼
┌──────────┐ ┌──────────────┐
│ Agent │ │ LLM Reporter │
│(LangGraph)│ │ (LangChain) │
│ - Tools │ │ - Templates │
│ - ReAct │ │ - Sections │
└────┬─────┘ └──────┬───────┘
│ │
▼ ▼
┌──────────┐ ┌─────────────┐
│ Dynamic │ │ CI/CD │
│ Analysis │ │ Report │
└──────────┘ └─────────────┘
DevFlow Analyzer uses PM4Py to apply process mining techniques to CI/CD data. The system generates a Directly-Follows Graph (DFG) that visualizes build status transitions.
The following DFG was generated from the TravisTorrent dataset (10,000 CI/CD builds from 21 open-source Java projects):
In this graph:
- Nodes represent build statuses (passed, failed, errored, canceled)
- Edges show transitions between consecutive builds per project
- Edge labels indicate the frequency of each transition
This visualization helps identify patterns such as:
- Recovery rate from failures (failed → passed)
- Build stability (passed → passed chains)
- Error clustering and infrastructure issues (errored states)
- Agentic analysis - ReAct-style agent that autonomously investigates CI/CD issues
- OpenAI-powered - Uses GPT-4o-mini (fast, affordable) or GPT-4o for advanced analysis
- Process mining integration - Uses PM4Py for DFG visualization and metrics
- A/B testing - Compare model configurations with labeled runs and quality ratings
- Structured analysis - Dataclasses for clean JSON serialization
# Clone repository
git clone https://github.com/albertodiazdurana/devflow-analyzer.git
cd devflow-analyzer
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template
cp .env.example .env
# Edit .env with your API keysEdit .env to configure the OpenAI provider:
# OpenAI (required for agent features)
OPENAI_API_KEY=sk-...The app uses OpenAI's GPT-4o-mini by default (fast and affordable at $0.15 per 1M tokens).
from pathlib import Path
from src.process_analyzer import ProcessAnalyzer
from src.agent import DevFlowAgent
# Load and analyze CI/CD data
analyzer = ProcessAnalyzer()
analyzer.load_data(Path("data/sample/travistorrent_10k.csv"))
result = analyzer.analyze()
# Create agent and investigate
agent = DevFlowAgent() # defaults to gpt-4o-mini
response = agent.investigate(result, "Which project has the highest failure rate?")
print(response)
# Or run a comprehensive analysis
response = agent.analyze(result)
print(response)from src.llm_reporter import LLMReporter
# Generate structured report
reporter = LLMReporter(model_key="gpt-4o-mini")
report = reporter.generate_report(result)
print(report.to_markdown())analyzer.generate_dfg(Path("outputs/figures/dfg.png"))| Key | Provider | Model | Cost (per 1M tokens) |
|---|---|---|---|
gpt-4o-mini |
OpenAI | GPT-4o Mini | $0.15 input / $0.60 output |
gpt-4o |
OpenAI | GPT-4o | $5.00 input / $15.00 output |
Note: The app uses OpenAI models which support tool calling required by the ReAct agent. See docs/decisions/DEC-001-default-llm-provider.md for details.
devflow-analyzer/
├── src/
│ ├── models.py # Data classes
│ ├── process_analyzer.py # PM4Py analysis
│ ├── llm_provider.py # LLM factory
│ ├── llm_reporter.py # Report generation
│ ├── agent.py # ReAct agent with tools
│ └── evaluation.py # MLflow tracking & A/B testing
├── prompts/ # Prompt templates
├── tests/ # Unit tests (86 tests)
├── data/sample/ # Sample datasets
├── outputs/ # Generated reports & figures
├── mlruns/ # MLflow experiment logs
└── docs/ # Documentation & decisions
DevFlow Analyzer works with TravisTorrent-style CSV data. Required columns:
| Column | Description |
|---|---|
tr_build_id |
Unique build identifier |
gh_project_name |
Project name |
tr_status |
Build status (passed/failed/errored) |
tr_duration |
Build duration in seconds |
gh_build_started_at |
Build timestamp |
-
Day 1: Foundation & Data Pipeline
- Process analyzer with PM4Py
- Data models with serialization
- DFG visualization
-
Day 2: Core Modules
- Provider-agnostic LLM factory
- Prompt templates
- LLM-powered report generation
-
Day 3: Agentic System
- ReAct-style agent with LangGraph
- Tools: summary stats, bottlenecks, failures, project comparison
- Dynamic analysis and investigation
-
Day 4: Evaluation Pipeline
- MLflow experiment tracking
- ROUGE scores for output quality
- Cost tracking per model
- A/B testing framework for model comparison
-
Day 5: Application & Deployment
- Streamlit UI with 4 tabs (Upload, Metrics, Agent, Evaluation)
- Demo notebook
- Deployed to Streamlit Community Cloud
- Live at devflow-analyzer.streamlit.app
-
Day 6: Enhanced Evaluation & A/B Testing
- Auto-calculated response metrics (tokens/sec, length, sections, actionability)
- User evaluation interface (quality, relevance, completeness, actionability ratings)
- A/B testing with run labels and data fingerprints for valid comparisons
- Quality vs cost/latency scatter plots and model comparison dashboards
# Run the Streamlit app
streamlit run app.py
# View MLflow experiments
mlflow ui --port 5000# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_analyzer.py -vMIT
