-
Notifications
You must be signed in to change notification settings - Fork 21
feat: Community Leaderboard for AgentReady Scores #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Complete cold-start implementation guide for community leaderboard: - CLI submit command (agentready submit) - GitHub Action validation workflow - Leaderboard aggregation script - Jekyll pages integrated with existing docs Enables self-service repository submissions with anti-gaming measures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 AgentReady Assessment ReportRepository: agentready 📊 Summary
Languages Detected
Repository Stats
🎖️ Certification Ladder
📋 Detailed FindingsAPI Documentation
Build & Development
Code Organization
Code Quality
❌ Type AnnotationsMeasured: 32.8% (Threshold: ≥80%) Evidence:
📝 Remediation StepsAdd type annotations to function signatures
Commands: # Python
pip install mypy
mypy --strict src/
# TypeScript
npm install --save-dev typescript
echo '{"compilerOptions": {"strict": true}}' > tsconfig.jsonExamples: ❌ Structured LoggingMeasured: not configured (Threshold: structured logging library) Evidence:
📝 Remediation StepsAdd structured logging library for machine-parseable logs
Commands: # Install structlog
pip install structlog
# Configure structlog
# See examples for configurationExamples: Context Window Optimization
❌ File Size LimitsMeasured: 2 huge, 8 large out of 137 (Threshold: <5% files >500 lines, 0 files >1000 lines) Evidence:
📝 Remediation StepsRefactor large files into smaller, focused modules
Examples: Dependency Management
Documentation
❌ Concise DocumentationMeasured: 276 lines, 40 headings, 38 bullets (Threshold: <500 lines, structured format) Evidence:
📝 Remediation StepsMake documentation more concise and structured
Commands: # Check README length
wc -l README.md
# Count headings
grep -c '^#' README.mdExamples: Features
DocumentationSee docs/ for detailed guides. Bad: Verbose proseThis project is a tool that helps you assess your repository [Many more paragraphs of prose...] Examples: Performance
Repository Structure
Security
Testing & CI/CD
🎯 Next StepsPriority Improvements (highest impact first):
📝 Assessment Metadata
🤖 Generated with Claude Code |
- Add PyGithub>=2.1.1 dependency for GitHub API integration - Create submit.py with full PR creation workflow - Validates GitHub token, assessment file, and repository access - Generates unique ISO 8601 timestamp filenames - Verifies submitter has commit access to repository - Creates fork, branch, commits assessment, and opens PR - Includes dry-run mode for testing - Comprehensive error handling and user guidance Submission workflow: 1. User runs: agentready submit 2. Command validates token and assessment 3. Verifies user has commit access to repo 4. Creates PR to agentready/agentready automatically 5. Validation workflow will run (next phase) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
**Phase 2**: Validation GitHub Action - validate-leaderboard-submission.yml workflow - Validates JSON schema, repo access, score accuracy (±2 tolerance) - Re-runs assessment for verification - Posts validation results as PR comments - Secure: all user input via environment variables **Phase 3**: Aggregation Script & Workflow - scripts/generate-leaderboard-data.py for data generation - Scans submissions/ directory, groups by repository - Generates docs/_data/leaderboard.json for Jekyll - Calculates overall, by-language, by-size, most-improved rankings - update-leaderboard.yml workflow triggers on submissions merge **Phase 4**: Jekyll Leaderboard Pages - docs/leaderboard/index.md with top 10 cards + full table - Tier-based color coding (Platinum/Gold/Silver/Bronze) - Responsive CSS styling in docs/assets/css/leaderboard.css - Added to navigation in _config.yml - Empty leaderboard.json for initial build **Code Quality**: - All files formatted with black - Imports sorted with isort - Ruff linting passed - Security: no command injection vulnerabilities in workflows 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
|
🤖 AgentReady Assessment ReportRepository: agentready 📊 Summary
Languages Detected
Repository Stats
🎖️ Certification Ladder
📋 Detailed FindingsAPI Documentation
Build & Development
Code Organization
Code Quality
❌ Type AnnotationsMeasured: 32.8% (Threshold: ≥80%) Evidence:
📝 Remediation StepsAdd type annotations to function signatures
Commands: # Python
pip install mypy
mypy --strict src/
# TypeScript
npm install --save-dev typescript
echo '{"compilerOptions": {"strict": true}}' > tsconfig.jsonExamples: ❌ Structured LoggingMeasured: not configured (Threshold: structured logging library) Evidence:
📝 Remediation StepsAdd structured logging library for machine-parseable logs
Commands: # Install structlog
pip install structlog
# Configure structlog
# See examples for configurationExamples: Context Window Optimization
❌ File Size LimitsMeasured: 2 huge, 8 large out of 139 (Threshold: <5% files >500 lines, 0 files >1000 lines) Evidence:
📝 Remediation StepsRefactor large files into smaller, focused modules
Examples: Dependency Management
Documentation
❌ Concise DocumentationMeasured: 276 lines, 40 headings, 38 bullets (Threshold: <500 lines, structured format) Evidence:
📝 Remediation StepsMake documentation more concise and structured
Commands: # Check README length
wc -l README.md
# Count headings
grep -c '^#' README.mdExamples: Features
DocumentationSee docs/ for detailed guides. Bad: Verbose proseThis project is a tool that helps you assess your repository [Many more paragraphs of prose...] Examples: Performance
Repository Structure
Security
Testing & CI/CD
🎯 Next StepsPriority Improvements (highest impact first):
📝 Assessment Metadata
🤖 Generated with Claude Code |
- Replace template literals with string concatenation in github-script - Add missing newlines at end of workflow files - Fix ruff E402 error in regenerate_heatmap.py (import after sys.path) - Ensures pre-commit check-yaml passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
|
🤖 AgentReady Assessment ReportRepository: agentready 📊 Summary
Languages Detected
Repository Stats
🎖️ Certification Ladder
📋 Detailed FindingsAPI Documentation
Build & Development
Code Organization
Code Quality
❌ Type AnnotationsMeasured: 32.8% (Threshold: ≥80%) Evidence:
📝 Remediation StepsAdd type annotations to function signatures
Commands: # Python
pip install mypy
mypy --strict src/
# TypeScript
npm install --save-dev typescript
echo '{"compilerOptions": {"strict": true}}' > tsconfig.jsonExamples: ❌ Structured LoggingMeasured: not configured (Threshold: structured logging library) Evidence:
📝 Remediation StepsAdd structured logging library for machine-parseable logs
Commands: # Install structlog
pip install structlog
# Configure structlog
# See examples for configurationExamples: Context Window Optimization
❌ File Size LimitsMeasured: 2 huge, 8 large out of 139 (Threshold: <5% files >500 lines, 0 files >1000 lines) Evidence:
📝 Remediation StepsRefactor large files into smaller, focused modules
Examples: Dependency Management
Documentation
❌ Concise DocumentationMeasured: 276 lines, 40 headings, 38 bullets (Threshold: <500 lines, structured format) Evidence:
📝 Remediation StepsMake documentation more concise and structured
Commands: # Check README length
wc -l README.md
# Count headings
grep -c '^#' README.mdExamples: Features
DocumentationSee docs/ for detailed guides. Bad: Verbose proseThis project is a tool that helps you assess your repository [Many more paragraphs of prose...] Examples: Performance
Repository Structure
Security
Testing & CI/CD
🎯 Next StepsPriority Improvements (highest impact first):
📝 Assessment Metadata
🤖 Generated with Claude Code |
Add regex pattern to ignore {{ variable }} syntax in markdown-link-check.
This prevents false positives on Jekyll/Liquid template variables like
{{ entry.url }} in leaderboard pages.
Fixes docs-lint workflow failure on PR #146.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Validates links in docs/ markdown files locally before push - Uses same .markdown-link-check.json config as CI workflow - Prevents broken link issues from reaching CI Now runs on every commit that modifies docs/*.md files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 AgentReady Assessment ReportRepository: agentready 📊 Summary
Languages Detected
Repository Stats
🎖️ Certification Ladder
📋 Detailed FindingsAPI Documentation
Build & Development
Code Organization
Code Quality
❌ Type AnnotationsMeasured: 32.8% (Threshold: ≥80%) Evidence:
📝 Remediation StepsAdd type annotations to function signatures
Commands: # Python
pip install mypy
mypy --strict src/
# TypeScript
npm install --save-dev typescript
echo '{"compilerOptions": {"strict": true}}' > tsconfig.jsonExamples: ❌ Structured LoggingMeasured: not configured (Threshold: structured logging library) Evidence:
📝 Remediation StepsAdd structured logging library for machine-parseable logs
Commands: # Install structlog
pip install structlog
# Configure structlog
# See examples for configurationExamples: Context Window Optimization
❌ File Size LimitsMeasured: 2 huge, 8 large out of 139 (Threshold: <5% files >500 lines, 0 files >1000 lines) Evidence:
📝 Remediation StepsRefactor large files into smaller, focused modules
Examples: Dependency Management
Documentation
❌ Concise DocumentationMeasured: 276 lines, 40 headings, 38 bullets (Threshold: <500 lines, structured format) Evidence:
📝 Remediation StepsMake documentation more concise and structured
Commands: # Check README length
wc -l README.md
# Count headings
grep -c '^#' README.mdExamples: Features
DocumentationSee docs/ for detailed guides. Bad: Verbose proseThis project is a tool that helps you assess your repository [Many more paragraphs of prose...] Examples: Performance
Repository Structure
Security
Testing & CI/CD
🎯 Next StepsPriority Improvements (highest impact first):
📝 Assessment Metadata
🤖 Generated with Claude Code |
🤖 AgentReady Assessment ReportRepository: agentready 📊 Summary
Languages Detected
Repository Stats
🎖️ Certification Ladder
📋 Detailed FindingsAPI Documentation
Build & Development
Code Organization
Code Quality
❌ Type AnnotationsMeasured: 32.8% (Threshold: ≥80%) Evidence:
📝 Remediation StepsAdd type annotations to function signatures
Commands: # Python
pip install mypy
mypy --strict src/
# TypeScript
npm install --save-dev typescript
echo '{"compilerOptions": {"strict": true}}' > tsconfig.jsonExamples: ❌ Structured LoggingMeasured: not configured (Threshold: structured logging library) Evidence:
📝 Remediation StepsAdd structured logging library for machine-parseable logs
Commands: # Install structlog
pip install structlog
# Configure structlog
# See examples for configurationExamples: Context Window Optimization
❌ File Size LimitsMeasured: 2 huge, 8 large out of 139 (Threshold: <5% files >500 lines, 0 files >1000 lines) Evidence:
📝 Remediation StepsRefactor large files into smaller, focused modules
Examples: Dependency Management
Documentation
❌ Concise DocumentationMeasured: 276 lines, 40 headings, 38 bullets (Threshold: <500 lines, structured format) Evidence:
📝 Remediation StepsMake documentation more concise and structured
Commands: # Check README length
wc -l README.md
# Count headings
grep -c '^#' README.mdExamples: Features
DocumentationSee docs/ for detailed guides. Bad: Verbose proseThis project is a tool that helps you assess your repository [Many more paragraphs of prose...] Examples: Performance
Repository Structure
Security
Testing & CI/CD
🎯 Next StepsPriority Improvements (highest impact first):
📝 Assessment Metadata
🤖 Generated with Claude Code |
Track which ruleset version was used for each assessment to ensure fair comparisons. Scores are only directly comparable when assessed with the same research version, as attributes and weights may change. **Changes**: 1. **Assessment Model**: - Add `research_version` field to AssessmentMetadata - Scanner loads version from ResearchLoader during assessment - Captured in every assessment JSON file 2. **Leaderboard Data**: - Aggregation script extracts `research_version` from submissions - Includes version in history for tracking changes over time - Added to leaderboard.json for Jekyll display 3. **Leaderboard Display**: - New "Ruleset" column in leaderboard table - Shows research version for each submission - Helps users understand scoring context 4. **Validation Workflow**: - Extracts research version from submitted assessment - Compares claimed vs actual research version - Warns if versions differ (scores not directly comparable) - PR comment includes version info and mismatch warnings **Why This Matters**: - Research versions may add/remove/reweight attributes - Comparing scores across versions can be misleading - Users can now see exactly which ruleset was used - Historical tracking shows how repositories improve 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 AgentReady Assessment ReportRepository: agentready 📊 Summary
Languages Detected
Repository Stats
🎖️ Certification Ladder
📋 Detailed FindingsAPI Documentation
Build & Development
Code Organization
Code Quality
❌ Type AnnotationsMeasured: 32.8% (Threshold: ≥80%) Evidence:
📝 Remediation StepsAdd type annotations to function signatures
Commands: # Python
pip install mypy
mypy --strict src/
# TypeScript
npm install --save-dev typescript
echo '{"compilerOptions": {"strict": true}}' > tsconfig.jsonExamples: ❌ Structured LoggingMeasured: not configured (Threshold: structured logging library) Evidence:
📝 Remediation StepsAdd structured logging library for machine-parseable logs
Commands: # Install structlog
pip install structlog
# Configure structlog
# See examples for configurationExamples: Context Window Optimization
❌ File Size LimitsMeasured: 2 huge, 8 large out of 139 (Threshold: <5% files >500 lines, 0 files >1000 lines) Evidence:
📝 Remediation StepsRefactor large files into smaller, focused modules
Examples: Dependency Management
Documentation
❌ Concise DocumentationMeasured: 276 lines, 40 headings, 38 bullets (Threshold: <500 lines, structured format) Evidence:
📝 Remediation StepsMake documentation more concise and structured
Commands: # Check README length
wc -l README.md
# Count headings
grep -c '^#' README.mdExamples: Features
DocumentationSee docs/ for detailed guides. Bad: Verbose proseThis project is a tool that helps you assess your repository [Many more paragraphs of prose...] Examples: Performance
Repository Structure
Security
Testing & CI/CD
🎯 Next StepsPriority Improvements (highest impact first):
📝 Assessment Metadata
🤖 Generated with Claude Code |
Both HTML and Markdown reporters were displaying hardcoded 'v1.0.0' instead of actual AgentReady and research versions from metadata. **Changes**: - Markdown footer: Use metadata.agentready_version and metadata.research_version - HTML footer: Same + add assessed_by and assessment_timestamp_human - Now shows accurate version info for reproducibility **Before**: - Tool Version: AgentReady v1.0.0 - Research Report: Bundled version **After**: - AgentReady Version: v2.8.1 - Research Version: v1.2.0 - Assessed By: jeder@hostname - Assessment Date: December 3, 2025 at 2:30 PM 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 AgentReady Assessment ReportRepository: agentready 📊 Summary
Languages Detected
Repository Stats
🎖️ Certification Ladder
📋 Detailed FindingsAPI Documentation
Build & Development
Code Organization
Code Quality
❌ Type AnnotationsMeasured: 32.8% (Threshold: ≥80%) Evidence:
📝 Remediation StepsAdd type annotations to function signatures
Commands: # Python
pip install mypy
mypy --strict src/
# TypeScript
npm install --save-dev typescript
echo '{"compilerOptions": {"strict": true}}' > tsconfig.jsonExamples: ❌ Structured LoggingMeasured: not configured (Threshold: structured logging library) Evidence:
📝 Remediation StepsAdd structured logging library for machine-parseable logs
Commands: # Install structlog
pip install structlog
# Configure structlog
# See examples for configurationExamples: Context Window Optimization
❌ File Size LimitsMeasured: 2 huge, 8 large out of 139 (Threshold: <5% files >500 lines, 0 files >1000 lines) Evidence:
📝 Remediation StepsRefactor large files into smaller, focused modules
Examples: Dependency Management
Documentation
❌ Concise DocumentationMeasured: 276 lines, 40 headings, 38 bullets (Threshold: <500 lines, structured format) Evidence:
📝 Remediation StepsMake documentation more concise and structured
Commands: # Check README length
wc -l README.md
# Count headings
grep -c '^#' README.mdExamples: Features
DocumentationSee docs/ for detailed guides. Bad: Verbose proseThis project is a tool that helps you assess your repository [Many more paragraphs of prose...] Examples: Performance
Repository Structure
Security
Testing & CI/CD
🎯 Next StepsPriority Improvements (highest impact first):
📝 Assessment Metadata
🤖 Generated with Claude Code |
- Add research_version parameter to all AssessmentMetadata.create() calls in tests - Add graceful fallback for None metadata in Markdown reporter footer - Add conditional check for None metadata in HTML template footer - Fixes test failures from metadata signature change
🤖 AgentReady Assessment ReportRepository: agentready 📊 Summary
Languages Detected
Repository Stats
🎖️ Certification Ladder
📋 Detailed FindingsAPI Documentation
Build & Development
Code Organization
Code Quality
❌ Type AnnotationsMeasured: 32.8% (Threshold: ≥80%) Evidence:
📝 Remediation StepsAdd type annotations to function signatures
Commands: # Python
pip install mypy
mypy --strict src/
# TypeScript
npm install --save-dev typescript
echo '{"compilerOptions": {"strict": true}}' > tsconfig.jsonExamples: ❌ Structured LoggingMeasured: not configured (Threshold: structured logging library) Evidence:
📝 Remediation StepsAdd structured logging library for machine-parseable logs
Commands: # Install structlog
pip install structlog
# Configure structlog
# See examples for configurationExamples: Context Window Optimization
❌ File Size LimitsMeasured: 2 huge, 8 large out of 139 (Threshold: <5% files >500 lines, 0 files >1000 lines) Evidence:
📝 Remediation StepsRefactor large files into smaller, focused modules
Examples: Dependency Management
Documentation
❌ Concise DocumentationMeasured: 276 lines, 40 headings, 38 bullets (Threshold: <500 lines, structured format) Evidence:
📝 Remediation StepsMake documentation more concise and structured
Commands: # Check README length
wc -l README.md
# Count headings
grep -c '^#' README.mdExamples: Features
DocumentationSee docs/ for detailed guides. Bad: Verbose proseThis project is a tool that helps you assess your repository [Many more paragraphs of prose...] Examples: Performance
Repository Structure
Security
Testing & CI/CD
🎯 Next StepsPriority Improvements (highest impact first):
📝 Assessment Metadata
🤖 Generated with Claude Code |
# [2.9.0](v2.8.1...v2.9.0) (2025-12-03) ### Features * Community Leaderboard for AgentReady Scores ([#146](#146)) ([fea0b3e](fea0b3e))
|
🎉 This PR is included in version 2.9.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Summary
Implements a community-driven leaderboard where users can submit their AgentReady assessment results via CLI. Submissions are validated through GitHub Actions and displayed on the existing GitHub Pages site.
Features
✅ Self-Service Submission:
agentready submitcommand creates PR automatically✅ Anti-Gaming Validation: Ownership verification + re-assessment
✅ Multiple Views: Overall, by-language, by-size, most-improved
✅ Historical Tracking: Multiple submissions per repo show improvement over time
✅ Integrated with Docs: Leaderboard pages added to existing Jekyll site
Components
CLI Submit Command (
src/agentready/cli/submit.py)Validation Workflow (
.github/workflows/validate-leaderboard-submission.yml)Aggregation Script (
scripts/generate-leaderboard-data.py)submissions/directorydocs/_data/leaderboard.jsonLeaderboard Pages (
docs/leaderboard/)Implementation Plan
See complete specification:
specs/leaderboard-feature-spec.mdPhase 1: CLI Submit Command (foundation)
Phase 2: Validation Workflow (anti-gaming)
Phase 3: Aggregation Script (data pipeline)
Phase 4: Leaderboard Pages (UI)
Example Usage
Design Decisions
2025-12-03T14-30-45-assessment.json(unique, sortable)submissions/{org}/{repo}/{timestamp}-assessment.jsonSecurity & Anti-Gaming
/tmpdirectory)Future Enhancements (Out of Scope)
[](...))Ready for Review: The spec is complete and implementation-ready. This PR will be updated with actual implementation commits following the 4-phase plan.
🤖 Generated with Claude Code