DevFlow Analyzer

An agentic ML system that applies process mining to CI/CD build data, identifies bottlenecks and failure patterns, and generates actionable insights using LLM-powered natural language generation.

Live Demo: devflow-analyzer.streamlit.app

Overview

DevFlow Analyzer takes CI/CD event logs (build history), performs process mining analysis, and generates comprehensive reports explaining:

Build Health - Overall success rates and trends
Bottlenecks - Slow builds and performance issues
Failure Patterns - Which projects fail most and why
Recommendations - Actionable steps to improve CI/CD performance

Architecture

┌─────────────────────┐
│   CI/CD Build Logs  │  (TravisTorrent, CSV)
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   Process Analyzer  │  (PM4Py)
│  - Load & validate  │
│  - Compute metrics  │
│  - Identify issues  │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Structured Metrics │  (BuildAnalysisResult)
│  - Success rates    │
│  - Duration stats   │
│  - Bottlenecks      │
└──────────┬──────────┘
           │
     ┌─────┴─────┐
     │           │
     ▼           ▼
┌──────────┐ ┌──────────────┐
│  Agent   │ │ LLM Reporter │
│(LangGraph)│ │ (LangChain)  │
│ - Tools  │ │ - Templates  │
│ - ReAct  │ │ - Sections   │
└────┬─────┘ └──────┬───────┘
     │              │
     ▼              ▼
┌──────────┐ ┌─────────────┐
│ Dynamic  │ │  CI/CD      │
│ Analysis │ │  Report     │
└──────────┘ └─────────────┘

Process Mining

DevFlow Analyzer uses PM4Py to apply process mining techniques to CI/CD data. The system generates a Directly-Follows Graph (DFG) that visualizes build status transitions.

Example: TravisTorrent Dataset

The following DFG was generated from the TravisTorrent dataset (10,000 CI/CD builds from 21 open-source Java projects):

In this graph:

Nodes represent build statuses (passed, failed, errored, canceled)
Edges show transitions between consecutive builds per project
Edge labels indicate the frequency of each transition

This visualization helps identify patterns such as:

Recovery rate from failures (failed → passed)
Build stability (passed → passed chains)
Error clustering and infrastructure issues (errored states)

Features

Agentic analysis - ReAct-style agent that autonomously investigates CI/CD issues
OpenAI-powered - Uses GPT-4o-mini (fast, affordable) or GPT-4o for advanced analysis
Process mining integration - Uses PM4Py for DFG visualization and metrics
A/B testing - Compare model configurations with labeled runs and quality ratings
Structured analysis - Dataclasses for clean JSON serialization

Installation

# Clone repository
git clone https://github.com/albertodiazdurana/devflow-analyzer.git
cd devflow-analyzer

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env
# Edit .env with your API keys

Configuration

Edit .env to configure the OpenAI provider:

# OpenAI (required for agent features)
OPENAI_API_KEY=sk-...

The app uses OpenAI's GPT-4o-mini by default (fast and affordable at $0.15 per 1M tokens).

Usage

Agent-Based Analysis (Recommended)

from pathlib import Path
from src.process_analyzer import ProcessAnalyzer
from src.agent import DevFlowAgent

# Load and analyze CI/CD data
analyzer = ProcessAnalyzer()
analyzer.load_data(Path("data/sample/travistorrent_10k.csv"))
result = analyzer.analyze()

# Create agent and investigate
agent = DevFlowAgent()  # defaults to gpt-4o-mini
response = agent.investigate(result, "Which project has the highest failure rate?")
print(response)

# Or run a comprehensive analysis
response = agent.analyze(result)
print(response)

Report Generation

from src.llm_reporter import LLMReporter

# Generate structured report
reporter = LLMReporter(model_key="gpt-4o-mini")
report = reporter.generate_report(result)
print(report.to_markdown())

Generate DFG Visualization

analyzer.generate_dfg(Path("outputs/figures/dfg.png"))

Available Models

Key	Provider	Model	Cost (per 1M tokens)
`gpt-4o-mini`	OpenAI	GPT-4o Mini	$0.15 input / $0.60 output
`gpt-4o`	OpenAI	GPT-4o	$5.00 input / $15.00 output

Note: The app uses OpenAI models which support tool calling required by the ReAct agent. See docs/decisions/DEC-001-default-llm-provider.md for details.

Project Structure

devflow-analyzer/
├── src/
│   ├── models.py           # Data classes
│   ├── process_analyzer.py # PM4Py analysis
│   ├── llm_provider.py     # LLM factory
│   ├── llm_reporter.py     # Report generation
│   ├── agent.py            # ReAct agent with tools
│   └── evaluation.py       # MLflow tracking & A/B testing
├── prompts/                # Prompt templates
├── tests/                  # Unit tests (86 tests)
├── data/sample/            # Sample datasets
├── outputs/                # Generated reports & figures
├── mlruns/                 # MLflow experiment logs
└── docs/                   # Documentation & decisions

Data Format

DevFlow Analyzer works with TravisTorrent-style CSV data. Required columns:

Column	Description
`tr_build_id`	Unique build identifier
`gh_project_name`	Project name
`tr_status`	Build status (passed/failed/errored)
`tr_duration`	Build duration in seconds
`gh_build_started_at`	Build timestamp

Development Progress

Day 1: Foundation & Data Pipeline
- Process analyzer with PM4Py
- Data models with serialization
- DFG visualization
Day 2: Core Modules
- Provider-agnostic LLM factory
- Prompt templates
- LLM-powered report generation
Day 3: Agentic System
- ReAct-style agent with LangGraph
- Tools: summary stats, bottlenecks, failures, project comparison
- Dynamic analysis and investigation
Day 4: Evaluation Pipeline
- MLflow experiment tracking
- ROUGE scores for output quality
- Cost tracking per model
- A/B testing framework for model comparison
Day 5: Application & Deployment
- Streamlit UI with 4 tabs (Upload, Metrics, Agent, Evaluation)
- Demo notebook
- Deployed to Streamlit Community Cloud
- Live at devflow-analyzer.streamlit.app
Day 6: Enhanced Evaluation & A/B Testing
- Auto-calculated response metrics (tokens/sec, length, sections, actionability)
- User evaluation interface (quality, relevance, completeness, actionability ratings)
- A/B testing with run labels and data fingerprints for valid comparisons
- Quality vs cost/latency scatter plots and model comparison dashboards

Quick Start

# Run the Streamlit app
streamlit run app.py

# View MLflow experiments
mlflow ui --port 5000

Testing

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_analyzer.py -v

License

MIT

Author

Alberto Diaz Durana GitHub | LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DevFlow Analyzer

Overview

Architecture

Process Mining

Example: TravisTorrent Dataset

Features

Installation

Configuration

Usage

Agent-Based Analysis (Recommended)

Report Generation

Generate DFG Visualization

Available Models

Project Structure

Data Format

Development Progress

Quick Start

Testing

License

Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
data/sample		data/sample
docs		docs
notebooks		notebooks
prompts		prompts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
PROJECT_PLAN.md		PROJECT_PLAN.md
README.md		README.md
app.py		app.py
packages.txt		packages.txt
requirements.txt		requirements.txt

albertodiazdurana/devflow-analyzer

Folders and files

Latest commit

History

Repository files navigation

DevFlow Analyzer

Overview

Architecture

Process Mining

Example: TravisTorrent Dataset

Features

Installation

Configuration

Usage

Agent-Based Analysis (Recommended)

Report Generation

Generate DFG Visualization

Available Models

Project Structure

Data Format

Development Progress

Quick Start

Testing

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages