Skip to content

Agentic AI for CI/CD analysis; LangGraph ReAct agent, process mining, MLflow evaluation; 10K+ builds, 86 tests

Notifications You must be signed in to change notification settings

albertodiazdurana/devflow-analyzer

Repository files navigation

DevFlow Analyzer

An agentic ML system that applies process mining to CI/CD build data, identifies bottlenecks and failure patterns, and generates actionable insights using LLM-powered natural language generation.

Live Demo: devflow-analyzer.streamlit.app

Overview

DevFlow Analyzer takes CI/CD event logs (build history), performs process mining analysis, and generates comprehensive reports explaining:

  • Build Health - Overall success rates and trends
  • Bottlenecks - Slow builds and performance issues
  • Failure Patterns - Which projects fail most and why
  • Recommendations - Actionable steps to improve CI/CD performance

Architecture

┌─────────────────────┐
│   CI/CD Build Logs  │  (TravisTorrent, CSV)
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   Process Analyzer  │  (PM4Py)
│  - Load & validate  │
│  - Compute metrics  │
│  - Identify issues  │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Structured Metrics │  (BuildAnalysisResult)
│  - Success rates    │
│  - Duration stats   │
│  - Bottlenecks      │
└──────────┬──────────┘
           │
     ┌─────┴─────┐
     │           │
     ▼           ▼
┌──────────┐ ┌──────────────┐
│  Agent   │ │ LLM Reporter │
│(LangGraph)│ │ (LangChain)  │
│ - Tools  │ │ - Templates  │
│ - ReAct  │ │ - Sections   │
└────┬─────┘ └──────┬───────┘
     │              │
     ▼              ▼
┌──────────┐ ┌─────────────┐
│ Dynamic  │ │  CI/CD      │
│ Analysis │ │  Report     │
└──────────┘ └─────────────┘

Process Mining

DevFlow Analyzer uses PM4Py to apply process mining techniques to CI/CD data. The system generates a Directly-Follows Graph (DFG) that visualizes build status transitions.

Example: TravisTorrent Dataset

The following DFG was generated from the TravisTorrent dataset (10,000 CI/CD builds from 21 open-source Java projects):

DFG - Build Status Transitions

In this graph:

  • Nodes represent build statuses (passed, failed, errored, canceled)
  • Edges show transitions between consecutive builds per project
  • Edge labels indicate the frequency of each transition

This visualization helps identify patterns such as:

  • Recovery rate from failures (failed → passed)
  • Build stability (passed → passed chains)
  • Error clustering and infrastructure issues (errored states)

Features

  • Agentic analysis - ReAct-style agent that autonomously investigates CI/CD issues
  • OpenAI-powered - Uses GPT-4o-mini (fast, affordable) or GPT-4o for advanced analysis
  • Process mining integration - Uses PM4Py for DFG visualization and metrics
  • A/B testing - Compare model configurations with labeled runs and quality ratings
  • Structured analysis - Dataclasses for clean JSON serialization

Installation

# Clone repository
git clone https://github.com/albertodiazdurana/devflow-analyzer.git
cd devflow-analyzer

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env
# Edit .env with your API keys

Configuration

Edit .env to configure the OpenAI provider:

# OpenAI (required for agent features)
OPENAI_API_KEY=sk-...

The app uses OpenAI's GPT-4o-mini by default (fast and affordable at $0.15 per 1M tokens).

Usage

Agent-Based Analysis (Recommended)

from pathlib import Path
from src.process_analyzer import ProcessAnalyzer
from src.agent import DevFlowAgent

# Load and analyze CI/CD data
analyzer = ProcessAnalyzer()
analyzer.load_data(Path("data/sample/travistorrent_10k.csv"))
result = analyzer.analyze()

# Create agent and investigate
agent = DevFlowAgent()  # defaults to gpt-4o-mini
response = agent.investigate(result, "Which project has the highest failure rate?")
print(response)

# Or run a comprehensive analysis
response = agent.analyze(result)
print(response)

Report Generation

from src.llm_reporter import LLMReporter

# Generate structured report
reporter = LLMReporter(model_key="gpt-4o-mini")
report = reporter.generate_report(result)
print(report.to_markdown())

Generate DFG Visualization

analyzer.generate_dfg(Path("outputs/figures/dfg.png"))

Available Models

Key Provider Model Cost (per 1M tokens)
gpt-4o-mini OpenAI GPT-4o Mini $0.15 input / $0.60 output
gpt-4o OpenAI GPT-4o $5.00 input / $15.00 output

Note: The app uses OpenAI models which support tool calling required by the ReAct agent. See docs/decisions/DEC-001-default-llm-provider.md for details.

Project Structure

devflow-analyzer/
├── src/
│   ├── models.py           # Data classes
│   ├── process_analyzer.py # PM4Py analysis
│   ├── llm_provider.py     # LLM factory
│   ├── llm_reporter.py     # Report generation
│   ├── agent.py            # ReAct agent with tools
│   └── evaluation.py       # MLflow tracking & A/B testing
├── prompts/                # Prompt templates
├── tests/                  # Unit tests (86 tests)
├── data/sample/            # Sample datasets
├── outputs/                # Generated reports & figures
├── mlruns/                 # MLflow experiment logs
└── docs/                   # Documentation & decisions

Data Format

DevFlow Analyzer works with TravisTorrent-style CSV data. Required columns:

Column Description
tr_build_id Unique build identifier
gh_project_name Project name
tr_status Build status (passed/failed/errored)
tr_duration Build duration in seconds
gh_build_started_at Build timestamp

Development Progress

  • Day 1: Foundation & Data Pipeline

    • Process analyzer with PM4Py
    • Data models with serialization
    • DFG visualization
  • Day 2: Core Modules

    • Provider-agnostic LLM factory
    • Prompt templates
    • LLM-powered report generation
  • Day 3: Agentic System

    • ReAct-style agent with LangGraph
    • Tools: summary stats, bottlenecks, failures, project comparison
    • Dynamic analysis and investigation
  • Day 4: Evaluation Pipeline

    • MLflow experiment tracking
    • ROUGE scores for output quality
    • Cost tracking per model
    • A/B testing framework for model comparison
  • Day 5: Application & Deployment

    • Streamlit UI with 4 tabs (Upload, Metrics, Agent, Evaluation)
    • Demo notebook
    • Deployed to Streamlit Community Cloud
    • Live at devflow-analyzer.streamlit.app
  • Day 6: Enhanced Evaluation & A/B Testing

    • Auto-calculated response metrics (tokens/sec, length, sections, actionability)
    • User evaluation interface (quality, relevance, completeness, actionability ratings)
    • A/B testing with run labels and data fingerprints for valid comparisons
    • Quality vs cost/latency scatter plots and model comparison dashboards

Quick Start

# Run the Streamlit app
streamlit run app.py

# View MLflow experiments
mlflow ui --port 5000

Testing

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_analyzer.py -v

License

MIT

Author

Alberto Diaz Durana GitHub | LinkedIn

About

Agentic AI for CI/CD analysis; LangGraph ReAct agent, process mining, MLflow evaluation; 10K+ builds, 86 tests

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published