Agentic Workflow Lock File Statistics - December 4, 2024 #5484

2025-12-04T03:38:40Z

github-actions[bot]
bot Dec 4, 2025

This comprehensive analysis examines all 104 .lock.yml files in the repository, revealing insights into workflow structure, trigger patterns, safe outputs, permissions, and configuration preferences across the gh-aw agentic workflow ecosystem.

Key Highlights:

104 lock files totaling 31 MB with an average size of 306 KB
Copilot engine dominates with 42 workflows (40%), followed by Claude (26) and Codex (8)
Most workflows are scheduled or manual - 81 use workflow_dispatch, 65 use schedule
Discussion creation is the most popular safe output (39 workflows), followed by comments (18) and issues (17)
Security-conscious permissions - 88 workflows use read-only permissions, only 7 require write access
37 workflows (36%) use strict mode for enhanced security

Full Statistical Report

Executive Summary

Total Lock Files: 104
Total Size: 31 MB
Average File Size: 306 KB
Analysis Date: 2024-12-04
Repository: githubnext/gh-aw

File Size Distribution

Size Range	Count	Percentage
50-100 KB	3	2.9%
100-200 KB	6	5.8%
200-300 KB	27	26.0%
300-400 KB	64	61.5%
> 400 KB	4	3.8%

Size Statistics:

Smallest: arxiv.lock.yml (81 KB) - MCP server configuration
Largest: poem-bot.lock.yml (605 KB) - Most complex workflow with 115 steps
Average: 306 KB
Total: 31 MB

The majority of lock files (61.5%) fall in the 300-400 KB range, indicating a consistent level of complexity across workflows.

Engine Distribution

Engine	Count	Percentage	Use Cases
Copilot	42	40.4%	Dominant engine, used across various workflow types
Claude	26	25.0%	Second most popular, often for complex analysis
Codex	8	7.7%	Used for specialized code generation tasks
Not specified	28	26.9%	Engine inherited or not explicitly set

Key Finding: Copilot is the most widely adopted engine, suggesting strong integration with GitHub's AI capabilities.

Trigger Analysis

Most Popular Triggers

Trigger Type	Count	Percentage	Description
workflow_dispatch	81	77.9%	Manual trigger - most workflows can be run on demand
schedule	65	62.5%	Automated scheduling - primarily for recurring tasks
pull_request	9	8.7%	PR-triggered workflows for code review and analysis
issues	4	3.8%	Issue-triggered workflows for triage and management
workflow_run	2	1.9%	Chained workflows triggered by other workflow completion
push	2	1.9%	Push-triggered workflows for continuous integration

Common Trigger Combinations

schedule + workflow_dispatch (58 workflows, 55.8%) - Most common pattern: automated daily/weekly runs with manual override capability
workflow_dispatch only (14 workflows, 13.5%) - Purely manual workflows for on-demand analysis
pull_request + schedule + workflow_dispatch (6 workflows, 5.8%) - Comprehensive coverage: automated, PR-based, and manual triggers
pull_request + workflow_dispatch (2 workflows) - PR review workflows with manual trigger option

Insight: The dominance of workflow_dispatch indicates workflows are designed for flexibility, allowing both automated and manual execution.

Schedule Patterns

Top 10 most common cron schedules:

Schedule (Cron)	Count	Description
`0 9 * * *`	4	Daily at 9 AM UTC
`0 0,6,12,18 * * *`	4	Every 6 hours (high-frequency monitoring)
`0 8 * * *`	3	Daily at 8 AM UTC
`0 10 * * 1-5`	3	Weekdays at 10 AM UTC (business hours)
`17 3 * * *`	2	Daily at 3:17 AM UTC
`0 9 * * 1-5`	2	Weekdays at 9 AM UTC
`0 9 * * 1`	2	Monday at 9 AM (weekly reports)
`0 6 * * 0`	2	Sunday at 6 AM UTC (weekly maintenance)
`0 15 * * 1`	2	Monday at 3 PM UTC (weekly summary)

Pattern: Most scheduled workflows run daily in the morning (UTC timezone), with some high-frequency monitoring every 6 hours and weekly reports on Mondays.

Safe Outputs Analysis

Safe Output Types Distribution

Safe Output Type	Workflows	Percentage	Primary Use Case
create-discussion	39	44.3%	Publishing analysis reports and findings
add-comment	18	20.5%	Adding feedback to issues/PRs
create-issue	17	19.3%	Creating actionable items from findings
create-pull-request	14	15.9%	Automated code improvements and documentation updates
update-issue	3	small	Updating existing issues

Total workflows with safe outputs: 88 (84.6% of all workflows)

Example Workflows by Safe Output Type

create-discussion (reporting and analysis):

artifacts-summary.lock.yml
audit-workflows.lock.yml
blog-auditor.lock.yml
commit-changes-analyzer.lock.yml
copilot-agent-analysis.lock.yml

create-issue (actionable findings):

breaking-change-checker.lock.yml
ci-doctor.lock.yml
cli-consistency-checker.lock.yml
cli-version-checker.lock.yml
craft.lock.yml

add-comment (feedback and suggestions):

archie.lock.yml
brave.lock.yml
ci-doctor.lock.yml
cloclo.lock.yml
craft.lock.yml

create-pull-request (automated improvements):

cloclo.lock.yml
daily-doc-updater.lock.yml
developer-docs-consolidator.lock.yml
dictation-prompt.lock.yml
github-mcp-tools-report.lock.yml

Safe Output Limits Configuration

Output Type	Max Limit	Count	Use Case
discussion	max=1	32	Single consolidated report per run
comment	max=1	11	Single comment per run
issue	max=1	4	Single issue per run
comment	max=3	3	Multiple related comments
issue	max=10	2	Batch issue creation
issue	max=5-6	2	Medium batch issue creation

Pattern: Most workflows use max=1 to create a single, consolidated output rather than multiple separate items. This prevents spam and keeps repository clean.

Discussion Categories

Category	Count	Purpose
audits / Audits	18	Security and compliance audits
General / general	9	General discussions and reports
reports	2	Structured reporting
dev	2	Development-related discussions
artifacts	2	Artifact analysis
security	1	Security-specific discussions
research	1	Research findings
daily-news	1	Daily updates
announcements	1	Announcements

Observation: "audits" is the most popular category (46% of discussion-creating workflows), indicating strong focus on monitoring and analysis.

Close-Older Pattern

35 workflows (89.7% of discussion-creating workflows) use close-older-discussions: true
0 workflows use close-older-issues: true

This pattern keeps the discussion list clean by automatically closing superseded reports, ensuring only the latest analysis is visible.

Structural Characteristics

Job and Step Complexity

Average Steps per Workflow: 62.2
Maximum Steps: 115 (poem-bot.lock.yml)
Minimum Steps: 28 (dev.lock.yml)

Distribution:

Small workflows (<40 steps): ~10%
Medium workflows (40-70 steps): ~70%
Large workflows (70+ steps): ~20%

Typical Lock File Structure:
A standard .lock.yml file in this repository has:

Size: ~306 KB
Steps: ~62 steps across multiple jobs
Permissions: Read-only contents, issues, pull-requests
Triggers: schedule + workflow_dispatch
Timeout: 10-20 minutes
Safe Output: create-discussion with max=1

Timeout Configuration

Timeout (minutes)	Count	Use Case
10	277	Standard timeout for most workflows
20	104	Extended timeout for complex analysis
15	18	Medium-complexity workflows
5	14	Quick smoke tests and checks
30	11	Long-running analysis
45	4	Very complex workflows
60	2	Maximum timeout for intensive tasks

Average Timeout: 11 minutes

Pattern: Most workflows (277 uses) default to 10-minute timeout, with a secondary group using 20 minutes for more complex operations.

Permission Patterns

Most Common Permissions

Permission	Count	Typical Access Level
contents	99	read (repository code access)
issues	86	write (for creating/updating issues)
pull-requests	85	write (for creating/updating PRs)
actions	47	read (workflow run access)
discussions	14	write (for creating discussions)
security-events	6	read (security scanning access)
repository-projects	3	read (project board access)

Permission Scope Analysis

Read-only workflows: 88 (84.6%)
With write permissions: 7 (6.7%)
Mixed (read + write to specific resources): 9 (8.7%)

Security Posture: The overwhelming majority (84.6%) of workflows use read-only base permissions, only requesting write access for specific resources (issues, pull-requests) through safe-output mechanisms. This demonstrates strong security practices.

Tool & Configuration Patterns

Tool Usage

Tool	Count	Percentage	Use Case
bash	62	59.6%	Command execution and scripting
cache-memory	42	40.4%	Persistent state across runs
web-fetch	7	6.7%	External data retrieval
web-search	2	1.9%	Web search capabilities

Observation: Bash is the most essential tool (59.6%), while cache-memory (40.4%) indicates significant use of persistent state for tracking trends and history.

Strict Mode Usage

Workflows with strict mode: 37 (35.6%)
Workflows without strict mode: 67 (64.4%)

Definition: Strict mode enforces stricter validation and security checks in workflow execution.

Pattern: About one-third of workflows use strict mode, typically for security-sensitive operations like malicious code scanning, file access, and token management.

Concurrency Controls

Workflows with concurrency settings: 2 (1.9%)
- cloclo.lock.yml
- tidy.lock.yml

Finding: Very few workflows implement concurrency controls, suggesting most workflows are designed to run independently without conflicts.

Common Imports

Top 10 shared imports:

Import	Count	Purpose
shared/reporting.md	40	Common reporting formats and structures
shared/jqschema.md	17	JSON schema processing utilities
shared/mcp/ghaw.md	9	GitHub Agentic Workflows MCP integration
shared/trendingchartssimple.md	6	Simple trending chart generation
shared/pythondataviz.md	6	Python data visualization helpers
shared/trends.md	5	Trend analysis utilities
shared/mcp/tavily.md	4	Tavily search MCP integration
shared/copilotprdatafetch.md	4	Copilot PR data fetching utilities
shared/safeoutputapp.md	3	Safe output helpers
shared/mcp/brave.md	2	Brave search MCP integration

Key Finding: shared/reporting.md is imported by 38.5% of workflows, indicating strong standardization around reporting formats. This promotes consistency across analyses.

Workflow Categories

By Naming Patterns

Category	Count	Examples
Daily workflows	14	daily-code-metrics, daily-team-status, daily-news
Smoke test workflows	8	smoke-claude, smoke-copilot, smoke-codex
Test workflows	7	test-app-token, test-firewall-default
Copilot-related	9	copilot-agent-analysis, copilot-session-insights

By Purpose (inferred from names and safe outputs)

Monitoring & Auditing (30+ workflows): Continuous monitoring of repository health, security, and quality
- Examples: audit-workflows, safe-output-health, daily-firewall-report
Code Review & PR Analysis (15+ workflows): Automated code review, PR feedback, and analysis
- Examples: grumpy-reviewer, pr-nitpick-reviewer, copilot-pr-nlp-analysis
Documentation (10+ workflows): Documentation generation, updates, and validation
- Examples: technical-doc-writer, daily-doc-updater, unbloat-docs
Issue Management (10+ workflows): Issue triage, classification, and assignment
- Examples: issue-triage-agent, issue-classifier, issue-arborist
Testing & Smoke Tests (8+ workflows): Automated testing and health checks
- Examples: smoke-claude, smoke-copilot, smoke-detector
Metrics & Analytics (10+ workflows): Data collection, analysis, and visualization
- Examples: daily-code-metrics, copilot-session-insights, python-data-charts
Maintenance & Cleanup (5+ workflows): Repository maintenance and cleanup tasks
- Examples: tidy, close-old-discussions, stale-repo-identifier

Interesting Findings

Copilot Dominance: 40.4% of workflows use the Copilot engine, significantly more than Claude (25%) or Codex (7.7%). This suggests strong integration with GitHub's native AI capabilities and possibly better performance or cost characteristics for this use case.
Manual Override Pattern: 77.9% of workflows support workflow_dispatch, even when scheduled. This design pattern enables developers to manually trigger workflows for debugging, testing, or ad-hoc analysis without waiting for scheduled runs.
Discussion-First Reporting: 39 workflows (44.3% of those with safe outputs) use create-discussion as their primary output mechanism, with most (89.7%) automatically closing older discussions. This creates a "single source of truth" pattern where the latest analysis is always the most visible.
Security-Conscious Design: 84.6% of workflows use read-only base permissions, only granting write access through controlled safe-output mechanisms. This demonstrates defense-in-depth security practices.
Standardized Reporting: The shared/reporting.md import is used by 40 workflows (38.5%), indicating strong standardization around reporting formats and structures across the repository.
Timeout Standardization: 277 workflow steps use the 10-minute timeout, with 104 using 20 minutes. This bimodal distribution suggests two classes of operations: quick checks and deeper analysis.
Morning UTC Scheduling: Most scheduled workflows run in the morning UTC hours (8-10 AM), optimizing for European working hours while overnight for US timezones.
Size Consistency: 87.5% of lock files fall in the 200-400 KB range, indicating consistent complexity across workflows despite diverse purposes.
Strict Mode Adoption: Only 35.6% of workflows use strict mode, suggesting it's reserved for security-sensitive operations rather than being a default setting.
Low Concurrency Needs: Only 2 workflows (1.9%) implement concurrency controls, indicating workflows are designed to be independent and non-conflicting.

Recommendations

Based on this statistical analysis, here are recommendations for improving agentic workflow practices:

For New Workflow Authors

Follow the 300 KB Standard: Target ~300 KB for lock file size with ~60 steps for consistency
Use schedule + workflow_dispatch: Enable both automated and manual execution for flexibility
Prefer create-discussion with max=1: For reporting workflows, consolidate findings into a single discussion
Import shared/reporting.md: Use standardized reporting formats for consistency
Start with 10-minute timeout: Use 10 minutes for quick checks, 20 minutes for deeper analysis
Use read-only base permissions: Only request write permissions through safe outputs
Set close-older-discussions: true: Keep the discussion list clean by auto-closing superseded reports

For Repository Maintainers

Monitor Engine Distribution: Track which engines perform best for different workflow types
Consider Strict Mode Defaults: Evaluate whether strict mode should be default for new workflows
Standardize Discussion Categories: The "audits" vs "Audits" inconsistency should be resolved (case sensitivity)
Document Schedule Patterns: Create guidelines for choosing appropriate cron schedules
Review Large Workflows: The poem-bot.lock.yml (605 KB, 115 steps) may benefit from decomposition
Track Historical Trends: Use the cache-memory analysis data to monitor growth and complexity over time

For Platform Development

Default Templates: Create workflow templates based on these common patterns
Validation Rules: Warn when workflows deviate significantly from the 300 KB / 60-step norm
Permission Presets: Provide "read-only with safe outputs" as a recommended permission preset
Schedule Optimizer: Help users choose appropriate cron schedules based on workflow type
Import Discovery: Make shared imports more discoverable to promote code reuse

Methodology

Data Collection

Lock Files Analyzed: 104
Analysis Tool: Bash scripts with YAML parsing and text processing
Cache Memory: Used for script persistence and historical data tracking
Data Sources: All .lock.yml files in .github/workflows/ directory

Analysis Scripts

Analysis performed using multiple specialized scripts:

analyze_lockfiles.sh - Primary data extraction
extract_detailed.sh - Engine, tool, and configuration extraction
safe_output_analysis.sh - Safe output pattern analysis
final_stats.sh - Schedule patterns and trigger combinations

Data Quality

All 104 lock files successfully parsed
No corrupted or malformed files encountered
Frontmatter YAML sections extracted from comment blocks
Counts verified through multiple independent queries

Limitations

MCP server configuration extraction incomplete (0 results) - requires deeper YAML parsing
Job count extraction needs refinement (returned 0 for average jobs)
Some edge cases in schedule pattern extraction
Historical comparison not available (first analysis run)

Cache Persistence

Analysis scripts and results saved to /tmp/gh-aw/cache-memory/ for:

Future reuse and faster subsequent analyses
Historical trend tracking over time
Pattern library development
Continuous improvement of analysis methods

Generated by Lockfile Statistics Analysis Agent on 2024-12-04T00:00:00Z

AI generated by Lockfile Statistics Analysis Agent

2025-12-07T13:32:31Z

github-actions[bot]
bot Dec 7, 2025
Author

This discussion was automatically closed because it was created by an agentic workflow more than 3 days ago.

0 replies