AI-powered codebase analysis tool providing comprehensive technical assessment using large language models.
# Local repository analysis
repo-analyzer analyze --repo /path/to/repo
# Remote repository analysis
repo-analyzer analyze --repo https://github.com/user/repo.git
# Developer perspective analysis with context
repo-analyzer analyze --repo /path/to/repo --mode developer --human-context "Fintech payment processing API with strict compliance requirements"
# Interactive conversation mode
repo-analyzer analyze --repo /path/to/repo --conversation-mode
- Python 3.8+
- Git
- Anthropic API key
git clone https://github.com/yuyudhan/repo_analyzer.git
cd repo_analyzer
pip install -e .
Create .env
file:
cp .env.example .env
Required environment variables:
# Required
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Optional
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
repo-analyzer check --check-all
Primary analysis command with full parameter set:
repo-analyzer analyze [OPTIONS]
Parameter | Type | Description |
---|---|---|
--repo |
string | Repository path (local directory) or URL (remote repository) |
Parameter | Type | Default | Description |
---|---|---|---|
--branch |
string | current | Git branch to checkout and analyze |
--mode |
choice | analysis | Analysis perspective: analysis | developer |
--llm |
choice | claude | LLM provider: claude |
--model |
string | claude-3-5-sonnet-20241022 | Specific model identifier |
--output-dir |
string | ./repo_analysis | Output directory for analysis results |
--files-per-chunk |
integer | 8 | Files processed per LLM request |
--use-compression |
flag | enabled | Enable smart code compression |
--no-compression |
flag | disabled | Disable smart code compression |
--max-indentation |
integer | 3 | Maximum indentation level preserved during compression |
--processing-delay |
float | 2.0 | Delay between API calls (seconds) |
--human-context |
string | none | Additional context for enhanced analysis quality |
--conversation-mode |
flag | disabled | Enable interactive analysis mode |
--verbose |
flag | disabled | Enable debug-level logging |
- analysis: Third-party technical assessment perspective
- developer: Internal team perspective explaining design decisions
# Full parameter analysis
repo-analyzer analyze \
--repo https://github.com/microsoft/vscode.git \
--branch main \
--mode developer \
--model claude-3-5-sonnet-20241022 \
--files-per-chunk 10 \
--use-compression \
--max-indentation 4 \
--processing-delay 1.5 \
--human-context "Enterprise IDE with performance and extensibility requirements" \
--output-dir ./analysis_results \
--verbose
# Conversation mode analysis
repo-analyzer analyze \
--repo /path/to/local/repo \
--conversation-mode \
--human-context "Legacy system migration with performance constraints"
System verification command:
repo-analyzer check [OPTIONS]
Parameter | Type | Default | Description |
---|---|---|---|
--check-all |
flag | disabled | Perform comprehensive system and API connectivity tests |
Display version information:
repo-analyzer version
Setting | Type | Default | Description |
---|---|---|---|
CHUNK_LINES |
int | 150 | Lines per code chunk for processing |
FILES_PER_CHUNK |
int | 8 | Files processed per LLM request |
USE_ENTIRE_FILES |
bool | true | Process complete files vs. chunked processing |
USE_SMART_COMPRESSION |
bool | true | Enable intelligent code compression |
MAX_FILE_SIZE |
int | 15000 | Maximum file size for processing (lines) |
MAX_INDENTATION_LEVEL |
int | 3 | Indentation depth preserved during compression |
INDENTATION_SPACES |
int | 4 | Spaces per indentation level |
Setting | Type | Default | Description |
---|---|---|---|
DEFAULT_LLM |
string | claude | Primary LLM provider |
DEFAULT_MODEL |
string | claude-3-5-sonnet-20241022 | Default model identifier |
MAX_TOKENS |
int | 8000 | Maximum tokens per request |
TEMPERATURE |
float | 0.1 | LLM temperature for deterministic output |
Provider | Requests/Min | Burst Limit | Retry After | Max Retries |
---|---|---|---|---|
claude |
50 | 5 | 2.0s | 3 |
openai |
60 | 10 | 1.0s | 3 |
default |
30 | 3 | 3.0s | 2 |
Setting | Type | Default | Description |
---|---|---|---|
PROCESSING_DELAY |
float | 2.0 | Inter-request delay (seconds) |
MAX_CONCURRENT_REQUESTS |
int | 3 | Concurrent LLM requests |
CLONE_TIMEOUT |
int | 300 | Git clone timeout (seconds) |
GIT_COMMAND_TIMEOUT |
int | 30 | Git command timeout (seconds) |
Programming Languages: .py
, .js
, .ts
, .jsx
, .tsx
, .go
, .rs
, .java
, .kt
, .scala
, .cpp
, .c
, .cs
, .php
, .rb
, .swift
, .dart
, .lua
, .sql
Web Technologies: .html
, .css
, .scss
, .sass
, .less
, .vue
, .svelte
Configuration: .json
, .yaml
, .yml
, .toml
, .xml
, .ini
, .conf
, .env
Documentation: .md
, .rst
, .txt
, .adoc
, .tex
Build/Deploy: Dockerfile
, docker-compose.yml
, Makefile
, .tf
, .hcl
, CMakeLists.txt
Scripts: .sh
, .bash
, .ps1
, .bat
, .fish
High-priority files automatically receive enhanced analysis:
Entry Points: main.py
, app.py
, server.py
, index.js
, main.go
, lib.rs
Configuration: package.json
, pyproject.toml
, go.mod
, Cargo.toml
, pom.xml
, build.gradle
Documentation: README.md
, CHANGELOG.md
, CONTRIBUTING.md
Infrastructure: Dockerfile
, docker-compose.yml
, Makefile
, webpack.config.js
repo_analysis/
└── {repository_name}/
├── {timestamp}_{repo_name}_analysis.md # Complete technical analysis
├── {repo_name}_latest.md # Latest analysis (symlink)
├── {timestamp}_{repo_name}_progress.md # Processing log
└── {timestamp}_{repo_name}_summary.json # Structured analysis data
Each report contains 10-section technical analysis:
- Repository Purpose - Technical goals, problem statement, system requirements
- Overview & Metrics - Quantitative assessment, code organization, health indicators
- Technology Stack - Language analysis, framework evaluation, dependency assessment
- Architecture - Design patterns, component interactions, scalability analysis
- Business Domain - Functional capabilities, domain logic, workflow implementation
- Implementation - Code quality, algorithms, integration patterns, testing strategy
- Infrastructure - Deployment strategy, environment management, operational considerations
- Development Workflow - Process analysis, tooling, collaboration patterns
- Security & Compliance - Security implementation, access control, vulnerability assessment
- Performance & Optimization - Performance characteristics, bottleneck analysis, scaling strategy
- Maintenance & Evolution - Technical debt, maintainability, future roadmap
repo_analyzer/
├── config/
│ ├── settings.py # Core configuration management
│ ├── rate_limits.py # LLM provider rate limiting
│ └── languages.py # Language detection and prioritization
├── src/repo_analyzer/
│ ├── cli.py # Command-line interface
│ ├── core/
│ │ ├── analyzer.py # Main orchestration logic
│ │ ├── conversation_analyzer.py # Technical analysis mode
│ │ ├── developer_explanation.py # Developer perspective mode
│ │ ├── file_processor.py # Code processing and compression
│ │ ├── git_handler.py # Repository management
│ │ └── env_extractor.py # Environment configuration analysis
│ ├── llm/
│ │ ├── factory.py # LLM provider abstraction
│ │ └── claude.py # Anthropic Claude integration
│ ├── utils/
│ │ └── logging_utils.py # Logging configuration
│ └── output/
│ └── report_generator.py # Report formatting and output
└── tests/ # Test suite
# GitHub Actions example
- name: Repository Analysis
run: |
repo-analyzer analyze \
--repo ${{ github.workspace }} \
--branch ${{ github.ref_name }} \
--output-dir ./analysis \
--human-context "CI/CD analysis for ${{ github.repository }}"
from repo_analyzer.core.analyzer import RepositoryAnalyzer
analyzer = RepositoryAnalyzer(llm_provider="claude")
results = analyzer.analyze_repository(
repo_path="/path/to/repo",
analysis_mode="developer",
human_context="API service with microservices architecture"
)
⚠️ Rate limit exceeded, waiting...
Solution: Tool automatically handles rate limiting. Check API plan limits for faster processing.
❌ Please set ANTHROPIC_API_KEY environment variable
Solution: Configure API key in .env
file.
❌ Repository path does not exist: /path/to/repo
Solution: Verify repository path exists and is accessible.
❌ Analysis failed: Memory allocation error
Solution: Reduce --files-per-chunk
parameter or enable --use-compression
.
MIT License - see LICENSE file for details.