-
Notifications
You must be signed in to change notification settings - Fork 43
Description
🔧 Semantic Function Clustering Analysis
Analysis of repository: githubnext/gh-aw
Analysis date: 2025-10-25
Executive Summary
This analysis examined 154 non-test Go source files across the pkg/ directory to identify refactoring opportunities through semantic function clustering, outlier detection, and code organization patterns. The analysis revealed several areas for improvement:
- 3 major files are extremely large and could benefit from decomposition
- 3 validation functions are in the wrong file (compiler.go instead of validation.go)
- Strong semantic clusters identified across parse*, validate*, generate*, and build* functions
- Multiple helper/utility files with potential for consolidation
- Engine pattern duplication across 3 engine implementations
Analysis Metadata
- Total Go Files Analyzed: 154
- Total Functions Cataloged: 300+
- Packages Analyzed: 6 (workflow, cli, parser, console, constants, logger)
- Primary Focus: pkg/workflow (91 files) and pkg/cli (52 files)
- Detection Method: Serena semantic code analysis + naming pattern analysis
Package Structure
By Package
| Package | File Count | Primary Purpose |
|---|---|---|
| pkg/workflow | 91 | Workflow compilation, execution, and management |
| pkg/cli | 52 | Command-line interface and CLI commands |
| pkg/parser | 6 | Parsing frontmatter, YAML, and GitHub content |
| pkg/console | 3 | Console rendering and output |
| pkg/constants | 1 | Application constants |
| pkg/logger | 1 | Logging utilities |
Identified Issues
1. 🔴 Oversized Files Needing Decomposition
Issue: Several files are extremely large (>1000 lines) and violate the Single Responsibility Principle
Critical: compiler.go - 3030 Lines, 56 Functions
File: pkg/workflow/compiler.go
Size: 3,030 lines with 56 functions
Issue: This file is a "god object" that handles multiple responsibilities
Responsibilities Mixed in This File:
- Workflow compilation
- YAML generation
- Job building
- Step generation
- Validation (should be in validation.go!)
- Config parsing (should be in config.go or dedicated files!)
- Frontmatter extraction
- Safe outputs handling
Validation Functions That Should Be in validation.go:
Line 2949: func (c *Compiler) validateHTTPTransportSupport(...)
Line 2968: func (c *Compiler) validateMaxTurnsSupport(...)
Line 2992: func (c *Compiler) validateWebSearchSupport(...)Recommendation: Break down compiler.go into focused modules:
compiler_core.go- Core compilation logiccompiler_yaml.go- YAML generation (already has generateYAML, etc.)compiler_jobs.go- Job building (buildJobs, buildMainJob, etc.)compiler_steps.go- Step generation (generateMainJobSteps, etc.)- Move validation functions to
validation.go - Move config parsing to dedicated config files
Estimated Impact: Major improvement in maintainability, testing, and code navigation
High Priority: claude_engine.go - 1312 Lines
File: pkg/workflow/claude_engine.go
Size: 1,312 lines
Issue: Large engine implementation file
Recommendation: Consider extracting:
- Tool parsing logic to
claude_tools.go - MCP config rendering to
claude_mcp.go - Log parsing to
claude_logs.go
Estimated Impact: Improved organization of engine-specific logic
High Priority: logs.go - 2785 Lines
File: pkg/cli/logs.go
Size: 2,785 lines
Issue: Handles multiple log formats and parsing strategies
Recommendation: Split into:
logs_core.go- Main log command logiclogs_parsing.go- Log parsing functions (parseAgentLog, parseFirewallLogs, etc.)logs_formatting.go- Formatting utilities (formatDuration, formatNumber, etc.)
Estimated Impact: Better organization of log handling code
2. 🟡 Outlier Functions (Functions in Wrong Files)
Issue: Functions that don't match their file's primary purpose
Example 1: Validation in Compiler File
- File:
pkg/workflow/compiler.go - Functions:
validateHTTPTransportSupport()(line 2949)validateMaxTurnsSupport()(line 2968)validateWebSearchSupport()(line 2992)
- Issue: Validation functions in compiler file
- Correct Location:
pkg/workflow/validation.go - Impact: Breaks separation of concerns
Code Reference:
// compiler.go:2949
func (c *Compiler) validateHTTPTransportSupport(tools map[string]any, engine CodingAgentEngine) error { ... }
// Should be in validation.go with other validation functions like:
// - validateExpressionSizes
// - validateContainerImages
// - validateRuntimePackagesRecommendation: Move these 3 validation methods to validation.go
3. 🟢 Well-Organized Patterns (✓ Good Examples)
Pattern: create_*.go files - Each creation function has its own file
These files follow excellent organization principles:
| File | Purpose | Functions |
|---|---|---|
create_issue.go |
Issue creation | parseIssuesConfig, buildCreateOutputIssueJob |
create_pull_request.go |
PR creation | parsePullRequestsConfig, buildCreateOutputPullRequestJob |
create_discussion.go |
Discussion creation | parseDiscussionsConfig, buildCreateOutputDiscussionJob |
create_code_scanning_alert.go |
Alert creation | parseCodeScanningAlertsConfig, buildCreateOutputCodeScanningAlertJob |
create_pr_review_comment.go |
Review comment | parsePullRequestReviewCommentsConfig, buildCreateOutputPullRequestReviewCommentJob |
create_agent_task.go |
Agent task | parseAgentTaskConfig, buildCreateOutputAgentTaskJob |
Analysis: Well-organized - each creation feature has its own file ✓
Pattern: Each file contains:
- Config parsing function (
parse*Config) - Job building function (
buildCreateOutput*Job)
This is an exemplary pattern that should be followed elsewhere.
4. 🟡 Scattered Helper Functions
Issue: Multiple helper/utility files without clear distinction
Files Found:
pkg/cli/shared_utils.go- Shared utilitiespkg/cli/frontmatter_utils.go- Frontmatter utilitiespkg/cli/repeat_utils.go- Retry/repeat logicpkg/workflow/engine_helpers.go- Engine helperspkg/workflow/prompt_step_helper.go- Prompt helperspkg/workflow/safe_output_helpers.go- Safe output helpers
Analysis:
- Some overlap in naming and purpose
- Not always clear which helper file to use
- Could benefit from consolidation or clearer naming
Recommendation:
- Consider consolidating CLI helpers into fewer, more focused files
- Consider renaming for clarity (e.g.,
cli_git_helpers.go,cli_formatting_helpers.go) - Document the purpose of each helper file
Estimated Impact: Easier discoverability, reduced confusion
Detailed Function Clusters
Cluster 1: Parse Functions (parse*)
Pattern: Functions with parse prefix for parsing configurations and data
Count: 47+ parse functions identified
Subclusters:
Config Parsing (parse*Config)
parseIssuesConfig()- create_issue.goparsePullRequestsConfig()- create_pull_request.goparseDiscussionsConfig()- create_discussion.goparseCommentsConfig()- add_comment.goparsePullRequestReviewCommentsConfig()- create_pr_review_comment.goparseCodeScanningAlertsConfig()- create_code_scanning_alert.goparseSafeJobsConfig()- safe_jobs.goparseThreatDetectionConfig()- threat_detection.goparseAgentTaskConfig()- create_agent_task.goparseUpdateIssuesConfig()- update_issue.goparsePushToPullRequestBranchConfig()- push_to_pull_request_branch.goparseMissingToolConfig()- missing_tool.go
Analysis: Strong, consistent pattern across all create_* and add_* features ✓
Package Parsing
parseNpmPackage()- dependabot.goparsePipPackage()- dependabot.goparseGoPackage()- dependabot.go
Analysis: Well-organized in dependabot.go ✓
Tool Parsing
parseGitHubTool()- tools_types.goparseBashTool()- tools_types.goparsePlaywrightTool()- tools_types.goparseWebFetchTool()- tools_types.goparseWebSearchTool()- tools_types.goparseEditTool()- tools_types.goparseAgenticWorkflowsTool()- tools_types.goparseCacheMemoryTool()- tools_types.goparseSafetyPromptTool()- tools_types.goparseTimeoutTool()- tools_types.goparseStartupTimeoutTool()- tools_types.go
Analysis: Excellent organization in tools_types.go ✓
Other Parsing Functions
parseTimeDelta()- time_delta.go ✓parseAbsoluteDateTime()- time_delta.go ✓parseRelativeDate()- time_delta.go ✓- Various CLI parse functions (parseRepoSpec, parseWorkflowSpec, etc.) in spec.go ✓
Overall Assessment: Parse functions are generally well-organized by feature/domain
Cluster 2: Validate Functions (validate*)
Pattern: Functions with validate prefix for validation
Count: 28+ validate functions identified
Location Distribution:
In validation.go (✓ Correct)
validateExpressionSizes()validateContainerImages()validateRuntimePackages()validateGitHubActionsSchema()validateNoDuplicateCacheIDs()validateSecretReferences()validateRepositoryFeatures()
In compiler.go (⚠️ Should move)
validateHTTPTransportSupport()← Should be in validation.govalidateMaxTurnsSupport()← Should be in validation.govalidateWebSearchSupport()← Should be in validation.go
In engine.go
validateEngine()validateSingleEngineSpecification()
In strict_mode.go
validateStrictMode()validateStrictPermissions()validateStrictNetwork()validateStrictMCPNetwork()validateStrictBashTools()
In Other Files
validateDockerImage()- docker.govalidateStringProperty()- mcp-config.govalidateMCPRequirements()- mcp-config.go- Package validation in pip.go, npm.go
Analysis: Generally well-organized, but the 3 validation functions in compiler.go are outliers
Recommendation: Move compiler.go validation functions to validation.go
Cluster 3: Generate Functions (generate*)
Pattern: Functions with generate prefix for generating workflow components
Count: 45+ generate functions identified
Subclusters:
YAML/Job Generation (in compiler.go)
generateYAML()generateJobName()generateMainJobSteps()generatePrompt()generatePostSteps()generateEngineExecutionSteps()generateOutputCollectionStep()
Step Generation for Uploads/Logs
generateUploadAgentLogs()generateUploadAssets()generateUploadAwInfo()generateUploadPrompt()generateUploadAccessLogs()generateUploadMCPLogs()generateLogParsing()generateErrorValidation()
Prompt Steps
generateCacheMemoryPromptStep()generateSafeOutputsPromptStep()generateStaticPromptStep()- prompt_step_helper.gogeneratePlaywrightPromptStep()generateTempFolderPromptStep()generateEditToolPromptStep()generateGitHubContextPromptStep()generateXPIAPromptStep()generatePRContextPromptStep()
Package/Config Generation (in dependabot.go)
generatePackageJSON()generatePackageLock()generateDependabotConfig()generateRequirementsTxt()generateGoMod()
Other Generation
generateCacheSteps()- cache.gogenerateCacheMemorySteps()- cache.gogenerateSafeOutputsConfig()- safe_output_helpers.go- Various Copilot-specific generation functions
Analysis: Many generate functions in compiler.go - this contributes to its large size
Recommendation: Consider extracting generate functions into compiler_generators.go or similar
Cluster 4: Build Functions (build*)
Pattern: Functions with build prefix for building jobs and steps
Count: 35+ build functions identified
Subclusters:
Job Building (in compiler.go)
buildJobs()buildMainJob()buildSafeOutputsJobs()buildPreActivationJob()buildActivationJob()buildCustomJobs()
Safe Output Job Building
buildCreateOutputIssueJob()- create_issue.go ✓buildCreateOutputPullRequestJob()- create_pull_request.go ✓buildCreateOutputDiscussionJob()- create_discussion.go ✓buildCreateOutputAddCommentJob()- add_comment.go ✓buildCreateOutputCodeScanningAlertJob()- create_code_scanning_alert.go ✓buildCreateOutputPullRequestReviewCommentJob()- create_pr_review_comment.go ✓buildCreateOutputAgentTaskJob()- create_agent_task.go ✓buildCreateOutputUpdateIssueJob()- update_issue.go ✓buildCreateOutputMissingToolJob()- missing_tool.go ✓buildCreateOutputPushToPullRequestBranchJob()- push_to_pull_request_branch.go ✓buildAddLabelsJob()- add_labels.go ✓buildUploadAssetsJob()- publish_assets.go ✓buildSafeJobs()- safe_jobs.go ✓
Analysis: Excellent pattern - each safe output job builder is in its own file ✓
Threat Detection Building (in threat_detection.go)
buildThreatDetectionJob()buildThreatDetectionSteps()buildEngineSteps()buildParsingStep()buildWorkflowContextEnvVars()- Many more threat detection build functions
Helper Build Functions
buildEventAwareCommandCondition()- command.gobuildArtifactDownloadSteps()- artifacts.gobuildAgentOutputDownloadSteps()- safe_output_helpers.gobuildConditionTree()- expressions.gobuildConcurrencyGroupKeys()- concurrency.go
Analysis: Build functions follow the create_* pattern well ✓
Cluster 5: Format/Render Functions
Pattern: Functions for formatting and rendering
Functions Identified:
FormatStepWithCommandAndEnv()- engine_helpers.goFormatJavaScriptForYAML()- js.goformatSafeOutputsRunsOn()- safe_outputs.goformatStringAsJavaScriptLiteral()- threat_detection.goformatDuration()- logs.go (CLI)formatNumber()- logs.go (CLI)- Various render functions (renderMCPFetchServerConfig, etc.)
Analysis: Scattered across multiple files, could be better organized
Recommendation: Consider consolidating formatting utilities
Refactoring Recommendations
Priority 1: High Impact - Critical Improvements
1.1 Move Validation Functions to validation.go
Effort: 30 minutes
Impact: High - Fixes clear violation of file organization
Actions:
- Move
validateHTTPTransportSupport()from compiler.go:2949 to validation.go - Move
validateMaxTurnsSupport()from compiler.go:2968 to validation.go - Move
validateWebSearchSupport()from compiler.go:2992 to validation.go - Update any imports if needed
- Run tests to verify no breaks
Benefits:
- Clear separation of concerns
- All validation logic in one place
- Easier to find and maintain validation functions
1.2 Decompose compiler.go (3030 Lines → Multiple Files)
Effort: 8-16 hours
Impact: Very High - Major improvement in maintainability
Proposed File Structure:
pkg/workflow/
compiler.go (core compilation logic, ~500-800 lines)
compiler_config.go (config parsing, move parse* methods)
compiler_jobs.go (job building, move build* methods)
compiler_steps.go (step generation, move generate*Step methods)
compiler_yaml.go (YAML generation, move generateYAML and related)
compiler_safe_outputs.go (safe outputs logic)
Migration Strategy:
- Create new files
- Move related methods to appropriate files
- Update tests
- Verify no functionality breaks
Benefits:
- Each file has single, clear purpose
- Easier to navigate codebase
- Faster to find relevant code
- Better for code review
- Easier to test individual components
Priority 2: Medium Impact - Structural Improvements
2.1 Decompose Large Engine Files
Files:
claude_engine.go(1312 lines)copilot_engine.go(996 lines)
Effort: 4-6 hours per engine
Impact: Medium - Improved organization
Recommendation:
- Extract tool parsing to
<engine>_tools.go - Extract MCP config to
<engine>_mcp.go - Extract log parsing to
<engine>_logs.go
2.2 Decompose logs.go (CLI)
File: `pkg/cl
[Content truncated due to length]
AI generated by Semantic Function Refactoring