[repository-quality] Repository Quality: Oversized File Decomposition #25307

2026-04-08T13:20:53Z

github-actions[bot]
bot Apr 8, 2026

🎯 Repository Quality Improvement Report — Oversized File Decomposition

Analysis Date: 2026-04-08
Focus Area: Oversized File Decomposition (Custom)
Strategy Type: Custom
Custom Area: Yes — selected because the codebase contains 76 source files exceeding 500 lines and 6 files exceeding 1,000 lines, directly violating the documented guidelines in AGENTS.md (100–200 lines per validator, hard limit 300).

Executive Summary

A file-size audit across the 672 Go source files (excluding tests) in this repository reveals significant structural debt. While the average file size of ~233 lines appears healthy, 76 files exceed 500 lines each, and 6 files exceed 1,000 lines — with the largest (gateway_logs.go) reaching 1,332 lines and combining five distinct responsibilities. The documented guidelines in AGENTS.md set a hard limit of 300 lines for validators; several validator files breach this.

The most acute case is logs_orchestrator.go, which contains a single function — DownloadWorkflowLogs — that spans ~562 lines by itself. Oversized files increase cognitive load for contributors, make targeted testing harder, and obscure single-responsibility boundaries. The good news: the natural seams for decomposition are already visible in the code — types are grouped, functions have clear domains, and comments enumerate responsibilities.

The five tasks below target the highest-leverage files, ordered by size and structural complexity.

Full Analysis Report

Current State Assessment

Metrics Collected:

Metric	Value	Status
Total source files (non-test .go)	672	✅
Average file size	233 lines	✅
Files > 1,000 lines	6	❌
Files 500–999 lines	70	⚠️
Files 300–499 lines	99	⚠️
Validator files > 300 lines (hard limit)	3	❌
Largest single function (approx)	~562 lines (`DownloadWorkflowLogs`)	❌
TODO/FIXME comments	11	✅

Top 10 Largest Files:

Lines	File
1,332	`pkg/cli/gateway_logs.go`
1,139	`pkg/cli/audit_report_render.go`
1,074	`pkg/cli/logs_orchestrator.go`
1,065	`pkg/cli/logs_report.go`
1,057	`pkg/workflow/compiler_orchestrator_workflow.go`
1,040	`pkg/workflow/compiler_safe_outputs_config.go`
973	`pkg/workflow/cache.go`
957	`pkg/workflow/frontmatter_types.go`
928	`pkg/parser/remote_fetch.go`
882	`cmd/gh-aw/main.go`

Validator Files Over Hard Limit (300 lines):

Lines	File
519	`pkg/workflow/safe_outputs_validation.go`
415	`pkg/workflow/safe_outputs_validation_config.go`
377	`pkg/workflow/mcp_config_validation.go`

Findings

Strengths

Average file size (233 lines) is reasonable — the median file is healthy
The codebase already uses a domain-split file naming convention (compiler_yaml_main_job.go, compiler_activation_job.go, etc.)
Validator files show good naming discipline; only 3 exceed the hard limit
497 of 672 files (74%) are under 300 lines — solid foundation

Areas for Improvement

[Critical] gateway_logs.go mixes 5 distinct concerns (types, parsing, metrics, aggregation, rendering) in 1,332 lines
[Critical] DownloadWorkflowLogs in logs_orchestrator.go is a ~562-line single function
[High] frontmatter_types.go mixes type definitions (230 lines) with parse/marshal methods (727 lines)
[High] audit_report_render.go packs 28 render functions across all data domains into one file
[Medium] safe_outputs_validation.go (519 lines) and safe_outputs_validation_config.go (415 lines) both exceed the 300-line hard limit

Detailed Analysis

gateway_logs.go is the most extreme example of responsibility mixing. Its own file comment lists five responsibilities: parsing gateway.jsonl, parsing rpc-messages.jsonl, extracting metrics, aggregating statistics, and rendering tables. It contains 13 type definitions and 15 functions. The natural split is:

gateway_types.go — the 13 struct definitions (~150 lines)
gateway_parser.go — parseGatewayLogs, parseRPCMessages, processGatewayLogEntry (~500 lines)
gateway_metrics.go — calculateGatewayAggregates, buildGuardPolicySummary, extractMCPToolUsageData (~350 lines)
gateway_renderer.go — renderGatewayMetricsTable, displayAggregatedGatewayMetrics (~330 lines)

logs_orchestrator.go has DownloadWorkflowLogs with a signature 20+ parameters long and a body of ~562 lines. It handles run discovery, filtering, concurrent download orchestration, and result aggregation. These are independent concerns that could each be separate functions called from a lean orchestrator.

frontmatter_types.go is already at its natural seam: the first 230 lines are pure struct definitions, and the remaining 727 lines are parsing and serialization methods. A simple file split with no logic change would immediately halve each file.

🤖 Tasks for Copilot Agent

NOTE TO PLANNER AGENT: The following 5 tasks are designed for GitHub Copilot coding agent execution. Please split these into individual work items for Claude to process sequentially (each task depends on a stable, compilable state).

Task 1: Split `gateway_logs.go` into Four Focused Files

Priority: High
Estimated Effort: Medium
Focus Area: Oversized File Decomposition

Description:
pkg/cli/gateway_logs.go (1,332 lines) mixes types, parsing, metrics aggregation, and rendering. The file comment itself lists 5 distinct responsibilities. Split it into four focused files following the existing naming convention.

Acceptance Criteria:

pkg/cli/gateway_types.go contains only the 13 struct type definitions (≈150 lines)
pkg/cli/gateway_parser.go contains log parsing functions (parseGatewayLogs, parseRPCMessages, parseRPCMessages, processGatewayLogEntry, helpers) (≈500 lines)
pkg/cli/gateway_metrics.go contains aggregation and build functions (calculateGatewayAggregates, buildGuardPolicySummary, extractMCPToolUsageData, helpers) (≈350 lines)
pkg/cli/gateway_renderer.go contains rendering functions (renderGatewayMetricsTable, displayAggregatedGatewayMetrics, helpers) (≈330 lines)
Original gateway_logs.go is deleted
make build passes with no errors
make test-unit passes

Code Region: pkg/cli/gateway_logs.go

Split `pkg/cli/gateway_logs.go` (1,332 lines) into four focused files by responsibility. The file's own comment lists 5 concerns — use them as the split guide.

Steps:
1. Create `pkg/cli/gateway_types.go` — move all `type` definitions (lines ~38–232) there, keeping the `package cli` header and any type-level constants like `maxScannerBufferSize` and `gatewayLogsLog`.
2. Create `pkg/cli/gateway_parser.go` — move log-parsing functions: `parseGatewayLogs`, `parseRPCMessages`, `findRPCMessagesPath`, `processGatewayLogEntry`, `getOrCreateServer`, `getOrCreateTool`, `buildToolCallsFromRPCMessages`, and helper functions they use.
3. Create `pkg/cli/gateway_metrics.go` — move metric/aggregation functions: `calculateGatewayAggregates`, `buildGuardPolicySummary`, `extractMCPToolUsageData`, `isGuardPolicyErrorCode`, `guardPolicyReasonFromCode`.
4. Create `pkg/cli/gateway_renderer.go` — move rendering functions: `renderGatewayMetricsTable`, `getSortedServerNames`, `displayAggregatedGatewayMetrics`.
5. Delete the original `gateway_logs.go`.
6. Move shared imports to each file as needed (each file in the same `package cli` package).
7. Run `make build && make test-unit` to verify everything compiles and tests pass.
8. Run `make fmt` to ensure formatting is correct.

Do NOT change any function signatures, logic, or behavior — this is a pure structural refactor.

Task 2: Decompose the `DownloadWorkflowLogs` Megafunction

Priority: High
Estimated Effort: Large
Focus Area: Oversized File Decomposition

Description:
DownloadWorkflowLogs in pkg/cli/logs_orchestrator.go is a single function spanning ~562 lines with 20+ parameters. It handles: run discovery/filtering, artifact downloading orchestration, filtering/processing of results, and report generation. Extract cohesive sub-functions to reduce it to a lean orchestrator of ≤100 lines.

Acceptance Criteria:

DownloadWorkflowLogs body is reduced to ≤150 lines (orchestration only)
At least 4 focused helper functions are extracted with clear names (e.g., filterWorkflowRuns, orchestrateDownloads, buildDownloadReport)
Each extracted function has a doc comment explaining its responsibility
make build passes with no errors
make test-unit passes

Code Region: pkg/cli/logs_orchestrator.go (lines 45–607)

Refactor `DownloadWorkflowLogs` in `pkg/cli/logs_orchestrator.go`. This single function is ~562 lines — extract cohesive blocks into well-named private functions.

Steps:
1. Read through `DownloadWorkflowLogs` (lines 45–607) and identify logical phases (e.g., "resolve run list", "apply date/ID filters", "download artifacts concurrently", "filter results by safeOutputType", "generate report").
2. For each phase, extract a private helper function in the same file with a descriptive name. Example extractions:
   - `resolveWorkflowRuns(...)` — fetches the list of runs to process
   - `applyRunFilters(runs []WorkflowRun, ...) []WorkflowRun` — filters by date, ID bounds, etc.
   - `buildLogsReport(results []DownloadResult, ...) error` — assembles and writes the report
3. Keep `DownloadWorkflowLogs` as the public entry point that calls these helpers in sequence; its body should shrink to ≤150 lines.
4. Add a one-line doc comment to each extracted function.
5. Run `make build && make test-unit` to verify correctness.
6. Run `make fmt`.

Do NOT change any function's external behavior or alter public signatures.

Task 3: Separate Type Definitions from Methods in `frontmatter_types.go`

Priority: Medium
Estimated Effort: Small
Focus Area: Oversized File Decomposition

Description:
pkg/workflow/frontmatter_types.go (957 lines) has a clear natural seam: the first ~230 lines are pure struct/type definitions, and the remaining ~727 lines are parsing and serialization methods (ParseFrontmatterConfig, parseRuntimesConfig, parsePermissionsConfig, ToMap, etc.). Splitting along this seam requires zero logic changes.

Acceptance Criteria:

pkg/workflow/frontmatter_types.go contains only type/struct definitions (≤250 lines)
pkg/workflow/frontmatter_config_parse.go contains ParseFrontmatterConfig, parseRuntimesConfig, parsePermissionsConfig, countRuntimes, ExtractMapField, ToMap, runtimesConfigToMap, permissionsConfigToMap, and their helpers
Both files are in package workflow
make build and make test-unit pass

Code Region: pkg/workflow/frontmatter_types.go

Split `pkg/workflow/frontmatter_types.go` (957 lines) into two files at its natural seam (line ~230).

Steps:
1. Create `pkg/workflow/frontmatter_config_parse.go` with `package workflow` header.
2. Move everything from line 230 onward to the new file: `ParseFrontmatterConfig`, `parseRuntimesConfig`, `parsePermissionsConfig`, `countRuntimes`, `ExtractMapField`, the `ToMap` method and its helpers (`runtimesConfigToMap`, `permissionsConfigToMap`).
3. Move the necessary imports to each file (types file needs only basic Go imports; parse file needs YAML, fmt, etc.).
4. Keep `frontmatter_types.go` with only the struct/type definitions and any type-level constants/vars.
5. Run `make build && make test-unit`.
6. Run `make fmt`.

This is a pure structural move — no logic changes whatsoever.

Task 4: Group `audit_report_render.go` Rendering Functions by Domain

Priority: Medium
Estimated Effort: Medium
Focus Area: Oversized File Decomposition

Description:
pkg/cli/audit_report_render.go (1,139 lines) contains 28 rendering functions for very different data domains: security/firewall, performance, MCP/tool usage, session analysis, and general overview. Group them into 3–4 domain-specific files.

Acceptance Criteria:

pkg/cli/audit_report_render.go is split into at least 3 domain-specific files, each ≤350 lines
Suggested split: audit_render_overview.go (overview, metrics, jobs, recommendations), audit_render_security.go (firewall, guard policy, redacted domains, policy analysis, safe output), audit_render_tools.go (MCP tools, tool usage, engine config, token usage, GitHub rate limit), audit_render_analysis.go (session, prompt, behavior fingerprint, performance, agentic assessments)
renderJSON and renderConsole (the top-level entry points) remain in audit_report_render.go (now a thin dispatcher, ≤100 lines)
make build and make test-unit pass

Code Region: pkg/cli/audit_report_render.go

Split `pkg/cli/audit_report_render.go` (1,139 lines, 28 functions) by rendering domain.

Steps:
1. Create `pkg/cli/audit_render_overview.go` — move: `renderOverview`, `renderMetrics`, `renderJobsTable`, `renderTaskDomain`, `renderKeyFindings`, `renderRecommendations`, `renderAuditComparison`.
2. Create `pkg/cli/audit_render_security.go` — move: `renderFirewallAnalysis`, `renderRedactedDomainsAnalysis`, `renderGuardPolicySummary`, `renderPolicyAnalysis`, `renderSafeOutputSummary`.
3. Create `pkg/cli/audit_render_tools.go` — move: `renderMCPToolUsageTable`, `renderToolUsageTable`, `renderEngineConfig`, `renderTokenUsage`, `renderGitHubRateLimitUsage`, `renderMCPServerHealth`.
4. Create `pkg/cli/audit_render_analysis.go` — move: `renderSessionAnalysis`, `renderPromptAnalysis`, `renderBehaviorFingerprint`, `renderAgenticAssessments`, `renderPerformanceMetrics`, `renderCreatedItemsTable`, `formatUnixTimestamp`.
5. Keep `audit_report_render.go` with only `renderJSON`, `renderConsole`, and their direct helpers.
6. Move imports appropriately to each new file (all in `package cli`).
7. Run `make build && make test-unit`.
8. Run `make fmt`.

No logic changes — pure structural reorganization.

Task 5: Split `safe_outputs_validation.go` to Comply with Hard Limit

Priority: Low
Estimated Effort: Small
Focus Area: Oversized File Decomposition / Validation Complexity

Description:
pkg/workflow/safe_outputs_validation.go (519 lines) and pkg/workflow/safe_outputs_validation_config.go (415 lines) both exceed the documented hard limit of 300 lines. Per AGENTS.md guidelines, validators with 2+ unrelated validation domains should be split. Review and apply the decision tree to bring both files under 300 lines.

Acceptance Criteria:

safe_outputs_validation.go is ≤300 lines
safe_outputs_validation_config.go is ≤300 lines
Any new files follow the {domain}_{subdomain}_validation.go naming convention
Minimum 30% comment coverage maintained in new files
make build, make test-unit, and make lint all pass

Code Region: pkg/workflow/safe_outputs_validation.go, pkg/workflow/safe_outputs_validation_config.go

Bring `pkg/workflow/safe_outputs_validation.go` (519 lines) and `pkg/workflow/safe_outputs_validation_config.go` (415 lines) under the 300-line hard limit documented in AGENTS.md.

Steps:
1. Read both files and identify the distinct validation domains within each.
2. For `safe_outputs_validation.go`: find natural splits (e.g., handler-level validation vs config-level validation vs output-channel validation) and extract into `safe_outputs_handler_validation.go` or similar.
3. For `safe_outputs_validation_config.go`: identify 2+ unrelated domains and extract the secondary domain into a new file with the `{domain}_{subdomain}_validation.go` naming convention.
4. Ensure minimum 30% comment coverage per AGENTS.md guidelines.
5. Run `make build && make test-unit && make lint`.
6. Run `make fmt`.

Follow the decision tree in AGENTS.md: "File > 300 lines? → Should split" and "Contains 2+ distinct domains? → Should split".

📊 Historical Context

Previous Focus Areas

Date	Focus Area	Type	Custom	Key Outcomes
2026-04-08	Oversized File Decomposition	Custom	Yes	5 tasks generated targeting 76 files >500 lines

🎯 Recommendations

Immediate Actions (This Week)

Task 1 (Split gateway_logs.go) — Priority: High. Pure move, no logic change, immediate 78% size reduction for the largest file.
Task 3 (Split frontmatter_types.go) — Priority: Medium. Trivial seam split, zero risk.

Short-term Actions (This Month)

Task 4 (Split audit_report_render.go by domain) — Priority: Medium.
Task 5 (Bring validators under hard limit) — Priority: Low. Enforces existing AGENTS.md policy.

Long-term Actions (This Quarter)

Task 2 (Decompose DownloadWorkflowLogs megafunction) — Priority: High. Most impactful for maintainability but requires careful testing.
Add a CI lint step (wc -l threshold check) to prevent future files from exceeding 600 lines without a deliberate exception comment.

📈 Success Metrics

Track these metrics to measure improvement in Oversized File Decomposition:

Files > 500 lines: 76 → target ≤ 20
Files > 1,000 lines: 6 → target 0
Largest single function (lines): ~562 → target ≤ 100
Validator files over hard limit: 3 → target 0
Average file size: ~233 lines → maintain ≤ 250 lines

Next Steps

Review and prioritize the 5 tasks above
Assign tasks to Copilot coding agent via planner agent (Tasks 1 and 3 first — lowest risk, immediate wins)
Track per-file size improvements after each task
Re-evaluate in 30 days — focus area should shift to a different quality dimension

References:

§24137276693

Generated by Repository Quality Improvement Agent · ● 514.5K · ◷

expires on Apr 9, 2026, 1:20 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[repository-quality] Repository Quality: Oversized File Decomposition #25307

Uh oh!

{{title}}

Uh oh!

Current State Assessment

Findings

Strengths

Areas for Improvement

Detailed Analysis

Replies: 0 comments

Select a reply

Uh oh!

[repository-quality] Repository Quality: Oversized File Decomposition #25307

Uh oh!

github-actions[bot] bot Apr 8, 2026

🎯 Repository Quality Improvement Report — Oversized File Decomposition

Executive Summary

Current State Assessment

Findings

Strengths

Areas for Improvement

Detailed Analysis

🤖 Tasks for Copilot Agent

Task 1: Split gateway_logs.go into Four Focused Files

Task 2: Decompose the DownloadWorkflowLogs Megafunction

Task 3: Separate Type Definitions from Methods in frontmatter_types.go

Task 4: Group audit_report_render.go Rendering Functions by Domain

Task 5: Split safe_outputs_validation.go to Comply with Hard Limit

📊 Historical Context

🎯 Recommendations

Immediate Actions (This Week)

Short-term Actions (This Month)

Long-term Actions (This Quarter)

📈 Success Metrics

Next Steps

Replies: 0 comments

github-actions[bot]
bot Apr 8, 2026

Task 1: Split `gateway_logs.go` into Four Focused Files

Task 2: Decompose the `DownloadWorkflowLogs` Megafunction

Task 3: Separate Type Definitions from Methods in `frontmatter_types.go`

Task 4: Group `audit_report_render.go` Rendering Functions by Domain

Task 5: Split `safe_outputs_validation.go` to Comply with Hard Limit