git-calculator

Calculate dora metrics and related from a Git repository on the local file system. Does not require integration with GitHub or any other git service provider.

Getting Started

First, clone this repository and set it up:

# Clone the repository
git clone https://github.com/yourusername/git-calculator.git
cd git-calculator

# Set up Python environment
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
pip install -r requirements.txt

# Set Python path
export PYTHONPATH=$(pwd)  # On Windows, use: set PYTHONPATH=%cd%

Navigate to the Git repository you want to analyze:

cd /path/to/your/repository

Run Python and calculate your metrics:

# Launch Python
python

# Import required modules
from src import git_ir as gir
from src.calculators import cycle_time_by_commits_calculator as commit_calc
from src.calculators import change_failure_calculator as cfc
from src.calculators import chart_generator as cg
from src.calculators import commit_analyzer as ca

# Get the data
logs = gir.git_log()

# Calculate cycle time
tds = commit_calc.calculate_time_deltas(logs)
cycle_time_data = commit_calc.commit_statistics_normalized_by_month(tds)

# Calculate change failure rate
data_by_month = cfc.extract_commit_data(logs)
failure_rate_data = [(month, rate) for month, rate in cfc.calculate_change_failure_rate(data_by_month).items()]

# Analyze commit trends by author
ca.analyze_commits()

# Generate charts and save data
cg.generate_charts(cycle_time_data=cycle_time_data, 
                  failure_rate_data=failure_rate_data,
                  save_data=True)

Check your results:
- A new metrics directory will be created in your repository
- You'll find several files with your repository name as prefix:
  - metrics/{repo_name}_cycle_time_data.csv - Raw cycle time data
  - metrics/{repo_name}_change_failure_data.csv - Raw change failure rate data
  - metrics/{repo_name}_cycle_time_chart.png - Cycle time chart
  - metrics/{repo_name}_change_failure_rate_chart.png - Change failure rate chart
  - metrics/commit_trends.png - Commit trends by author
  - metrics/commit_{author}_commits.csv - Individual author commit data
  - metrics/commit_percentiles.csv - Author commit percentiles
To generate new charts later without recalculating:

from src.calculators import chart_generator as cg

# Load the saved data
cycle_time_data, failure_rate_data = cg.load_metrics_data()

# Generate new charts
cg.generate_charts(cycle_time_data=cycle_time_data, 
                  failure_rate_data=failure_rate_data)

Project Outline

git-calculator/
│
├── src/
|   ├── git_ir.py        # In memory representation of Git metadata
│   ├── calculators/
│   │   ├── cycle_time_calculator_by_branches.py  # Cycle time stats by branch
│   │   ├── cycle_time_calculator_by_commits.py  # Cycle time stats by commit
│   │   ├── change_failure_calculator.py         # Change failure rate stats
│   │   ├── commit_analyzer.py                   # Commit trends by author
│   │   └── chart_generator.py                   # Chart generation utilities
│   ├── util/
│   │   ├── git_util.py  # Helpers for interacting with a Git repo
│   │   └── toy_repo.py  # Temporary toy repo on the filesystem for testing
│
├── tests/
│   └── test_*.py        # Unit tests
│
├── README.md             # Documentation
├── requirements.txt      # Dependencies
└── setup.py              # Setup

Project Setup

cd git-calculator
export PYTHONPATH=$(pwd)

Set up virtual environment:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Project Testing

Run unit tests

pytest -v

For debugging: export PYTEST_ADDOPTS="--log-cli-level=DEBUG"

Project Playing Around

To play around with the interpreter:

python
from src.util.toy_repo import ToyRepoCreator
trc = ToyRepoCreator("/Users/denalilumma/doubling-code/scratch")
even_intervals = [7 * i for i in range(12)]  # Weekly intervals
trc.create_custom_commits(even_intervals)

(Replace with your local path)

from src.calculators.cycle_time_by_commits_calculator import cycle_time_between_commits_by_author
result = cycle_time_between_commits_by_author(None, bucket_size=4, window_size=2)
print(result)

Project Usage

To calculate statistics for a given repository, proceed with the following sequence.

Step one, go to this repo in the terminal and set the python path:

cd git_calculator
export PYTHONPATH=$(pwd)

Set up virtual environment:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Finally, go to the git repo you want to analyze:

cd tensorflow

Analyze:

# Launch python3 
python
# Paste:
from src import git_ir as gir
from src.calculators import cycle_time_by_commits_calculator as commit_calc
logs = gir.git_log()
tds = commit_calc.calculate_time_deltas(logs)
result = commit_calc.commit_statistics_normalized_by_month(tds)
commit_calc.write_commit_statistics_to_file(result, "scratch.csv") # Default file name is "a.csv"

Example output:

INTERVAL START, SUM, AVERAGE, p75 CYCLE TIME (minutes), std CYCLE TIME
2023-10,161280.0,40320.0,40320,0
2023-11,120960.0,40320.0,40320,0

To calculate change failure rate:

# Launch python3 
python
# Paste:
from src import git_ir as gir
from src.calculators import change_failure_calculator as cfc
logs = gir.git_log()
data_by_month = cfc.extract_commit_data(logs)
change_failure_rates = cfc.calculate_change_failure_rate(data_by_month)
cfc.write_change_failure_rate_to_file(change_failure_rates, "change_failure_rate.csv") # Default file name is "change_failure_rate_by_month.csv"

Example output:

Month,Change Failure Rate (%)
2023-10,25.0
2023-11,33.3

The change failure rate is calculated by identifying commits that contain keywords like "revert", "hotfix", "bugfix", "bug", "fix", "problem", or "issue" in their commit messages. The rate is expressed as a percentage of total commits that required fixes.

To analyze commit trends by author:

# Launch python3 
python
# Paste:
from src.calculators import commit_analyzer as ca
ca.analyze_commits()

This will generate:

A commit trends chart showing commits over time for each author
CSV files with individual author commit data
A CSV file with commit percentiles for all authors

Generating Charts

To generate modern-looking charts with trendlines for both metrics:

# First time: Calculate and save the data
from src import git_ir as gir
from src.calculators import cycle_time_by_commits_calculator as commit_calc
from src.calculators import change_failure_calculator as cfc
from src.calculators import chart_generator as cg

# Get the data
logs = gir.git_log()

# Calculate cycle time
tds = commit_calc.calculate_time_deltas(logs)
cycle_time_data = commit_calc.commit_statistics_normalized_by_month(tds)

# Calculate change failure rate
data_by_month = cfc.extract_commit_data(logs)
failure_rate_data = [(month, rate) for month, rate in cfc.calculate_change_failure_rate(data_by_month).items()]

# Save data and generate charts
cg.generate_charts(cycle_time_data=cycle_time_data, 
                  failure_rate_data=failure_rate_data,
                  save_data=True)

# Later: Load saved data and generate new charts
from src.calculators import chart_generator as cg

# Load the saved data
cycle_time_data, failure_rate_data = cg.load_metrics_data()

# Generate new charts
cg.generate_charts(cycle_time_data=cycle_time_data, 
                  failure_rate_data=failure_rate_data)

This will create a metrics directory in your repository and save four files with the repository name as prefix (e.g., tensorflow_cycle_time_data.csv):

metrics/{repo_name}_cycle_time_data.csv - Raw cycle time data
metrics/{repo_name}_change_failure_data.csv - Raw change failure rate data
metrics/{repo_name}_cycle_time_chart.png - Cycle time chart
metrics/{repo_name}_change_failure_rate_chart.png - Change failure rate chart

The repository name is automatically detected from:

The git remote URL (e.g., git@github.com:user/tensorflow.git → tensorflow)
If no remote is found, the current directory name is used
If neither is available, repo is used as a fallback

You can also use a custom prefix instead of the repository name:

# Save with custom prefix
cg.generate_charts(cycle_time_data=cycle_time_data, 
                  failure_rate_data=failure_rate_data,
                  save_data=True,
                  prefix='team_a_')

# Load with custom prefix
cycle_time_data, failure_rate_data = cg.load_metrics_data(prefix='team_a_')
cg.generate_charts(cycle_time_data=cycle_time_data, 
                  failure_rate_data=failure_rate_data)

This is useful when you want to compare metrics across different teams or time periods.

Multi-Repository Analysis

The git-calculator now supports analyzing multiple repositories simultaneously, allowing you to compare DORA metrics across different projects, teams, or time periods.

Command Line Interface

The easiest way to use multi-repository analysis is through the command-line interface:

Single Repository Analysis

# Analyze a single repository
python -m src.cli single /path/to/repo

# Specify custom output directory
python -m src.cli single /path/to/repo --output my_analysis

Multiple Repository Analysis

Create a repository configuration file:

# Create a sample configuration file
python -m src.cli config --create-sample

This creates a repo_config.json file with the following structure:

{
  "repositories": [
    {
      "name": "frontend-app",
      "path_or_url": "/path/to/local/frontend",
      "branch": "main",
      "description": "Frontend application repository"
    },
    {
      "name": "backend-api", 
      "path_or_url": "https://github.com/company/backend-api.git",
      "branch": "develop",
      "description": "Backend API repository"
    },
    {
      "name": "mobile-app",
      "path_or_url": "git@github.com:company/mobile-app.git",
      "description": "Mobile application repository"
    }
  ]
}

Analyze multiple repositories:

# Analyze repositories from config file
python -m src.cli multi --config repo_config.json

# Update repositories before analysis
python -m src.cli multi --config repo_config.json --update

# Specify custom output directory
python -m src.cli multi --config repo_config.json --output team_analysis

Configuration Options

The repository configuration supports:

Local paths: /path/to/local/repo
HTTPS URLs: https://github.com/user/repo.git
SSH URLs: git@github.com:user/repo.git
Specific branches: "branch": "develop"
Descriptions: "description": "Project description"

Programmatic Multi-Repository Analysis

You can also use the multi-repository functionality programmatically:

from src.multi_repo_manager import MultiRepoManager
from src.calculators.multi_repo_calculator import MultiRepoCalculator
from src.calculators.multi_repo_chart_generator import MultiRepoChartGenerator

# Initialize repository manager
with MultiRepoManager() as repo_manager:
    # Add repositories
    repo_manager.add_repository("frontend", "/path/to/frontend")
    repo_manager.add_repository("backend", "https://github.com/user/backend.git")
    repo_manager.add_repository("mobile", "git@github.com:user/mobile.git")
    
    # Clone remote repositories
    clone_results = repo_manager.clone_repositories()
    print(f"Clone results: {clone_results}")
    
    # Calculate metrics for all repositories
    calculator = MultiRepoCalculator(repo_manager)
    all_metrics = calculator.calculate_all_metrics()
    
    # Save aggregated metrics
    calculator.save_aggregated_metrics(all_metrics, "multi_repo_metrics")
    
    # Generate comparison charts
    chart_generator = MultiRepoChartGenerator("multi_repo_charts")
    generated_charts = chart_generator.generate_all_comparison_charts(all_metrics)
    
    print(f"Generated {len(generated_charts)} comparison charts")

Multi-Repository Output

When analyzing multiple repositories, the tool creates:

Directory Structure

multi_repo_analysis/
├── metrics/
│   ├── frontend_metrics.json
│   ├── backend_metrics.json
│   ├── mobile_metrics.json
│   ├── aggregated_cycle_time.csv
│   ├── aggregated_failure_rate.csv
│   ├── aggregated_active_developers.csv
│   ├── aggregated_throughput.csv
│   └── summary_report.json
└── charts/
    ├── cycle_time_comparison.png
    ├── failure_rate_comparison.png
    ├── active_developers_comparison.png
    ├── throughput_comparison.png
    └── repository_summary.png

Individual Repository Metrics

Each repository gets its own JSON file with detailed metrics:

Cycle time data (monthly)
Change failure rate data (monthly)
Active developers data (monthly)
Throughput data (monthly)
Commit percentiles
Total commits and authors
Date range of analysis

Aggregated Metrics

aggregated_cycle_time.csv: Average cycle time across all repositories
aggregated_failure_rate.csv: Average change failure rate across all repositories
aggregated_active_developers.csv: Total unique active developers across all repositories
aggregated_throughput.csv: Total commits across all repositories

Comparison Charts

cycle_time_comparison.png: Line chart comparing cycle time trends across repositories
failure_rate_comparison.png: Line chart comparing change failure rates across repositories
active_developers_comparison.png: Line chart comparing active developer counts across repositories
throughput_comparison.png: Line chart comparing commit throughput across repositories
repository_summary.png: Bar charts showing key metrics comparison across repositories

Summary Report

The summary_report.json contains:

Total number of repositories analyzed
Total commits across all repositories
Total unique authors across all repositories
Date ranges for each repository
Individual repository summaries with key metrics

Advanced Usage

Custom Workspace Directory

# Use a custom workspace for cloned repositories
with MultiRepoManager(workspace_dir="/tmp/my_analysis") as repo_manager:
    # ... analysis code ...

Repository Context Management

# Temporarily work with a specific repository
with repo_manager.repository_context("frontend"):
    logs = git_log()
    # ... perform analysis ...

Selective Analysis

# Calculate metrics for specific repositories only
calculator = MultiRepoCalculator(repo_manager)
frontend_metrics = calculator.calculate_repo_metrics("frontend")
backend_metrics = calculator.calculate_repo_metrics("backend")

# Generate charts for specific repositories
selected_metrics = {
    "frontend": frontend_metrics,
    "backend": backend_metrics
}
chart_generator.generate_all_comparison_charts(selected_metrics)

Use Cases

Multi-repository analysis is particularly useful for:

Team Comparison: Compare DORA metrics across different development teams
Project Comparison: Analyze performance across different projects or products
Technology Stack Analysis: Compare metrics between different technology stacks
Time Period Analysis: Compare metrics across different time periods
Merger/Acquisition Analysis: Analyze metrics before and after organizational changes
Benchmarking: Establish baseline metrics across multiple repositories

Performance Considerations

Large Repositories: Analysis time scales with repository size and commit history
Network Operations: Cloning remote repositories requires network access
Memory Usage: Multiple repositories may require significant memory for analysis
Caching: Metrics are cached to avoid recalculation during the same session

Troubleshooting

Common Issues

Repository Access: Ensure you have access to all repositories (SSH keys, authentication)
Branch Availability: Verify that specified branches exist in remote repositories
Disk Space: Ensure sufficient disk space for cloning repositories
Permissions: Check file system permissions for output directories

Debug Mode

# Enable verbose logging
python -m src.cli multi --config repo_config.json --verbose

Error Handling

The tool provides detailed error messages and continues processing other repositories even if some fail. Check the logs for specific error details.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
examples		examples
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Doubling-Open-Source/git_calculator

Folders and files

Latest commit

History

Repository files navigation