Skip to content

[refactor] 🔧 Semantic Function Clustering Analysis - Refactoring Opportunities Identified #2388

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Analysis of repository: githubnext/gh-aw
Analysis date: 2025-10-25

Executive Summary

This analysis examined 154 non-test Go source files across the pkg/ directory to identify refactoring opportunities through semantic function clustering, outlier detection, and code organization patterns. The analysis revealed several areas for improvement:

  • 3 major files are extremely large and could benefit from decomposition
  • 3 validation functions are in the wrong file (compiler.go instead of validation.go)
  • Strong semantic clusters identified across parse*, validate*, generate*, and build* functions
  • Multiple helper/utility files with potential for consolidation
  • Engine pattern duplication across 3 engine implementations

Analysis Metadata

  • Total Go Files Analyzed: 154
  • Total Functions Cataloged: 300+
  • Packages Analyzed: 6 (workflow, cli, parser, console, constants, logger)
  • Primary Focus: pkg/workflow (91 files) and pkg/cli (52 files)
  • Detection Method: Serena semantic code analysis + naming pattern analysis

Package Structure

By Package

Package File Count Primary Purpose
pkg/workflow 91 Workflow compilation, execution, and management
pkg/cli 52 Command-line interface and CLI commands
pkg/parser 6 Parsing frontmatter, YAML, and GitHub content
pkg/console 3 Console rendering and output
pkg/constants 1 Application constants
pkg/logger 1 Logging utilities

Identified Issues

1. 🔴 Oversized Files Needing Decomposition

Issue: Several files are extremely large (>1000 lines) and violate the Single Responsibility Principle

Critical: compiler.go - 3030 Lines, 56 Functions

File: pkg/workflow/compiler.go
Size: 3,030 lines with 56 functions
Issue: This file is a "god object" that handles multiple responsibilities

Responsibilities Mixed in This File:

  • Workflow compilation
  • YAML generation
  • Job building
  • Step generation
  • Validation (should be in validation.go!)
  • Config parsing (should be in config.go or dedicated files!)
  • Frontmatter extraction
  • Safe outputs handling

Validation Functions That Should Be in validation.go:

Line 2949: func (c *Compiler) validateHTTPTransportSupport(...)
Line 2968: func (c *Compiler) validateMaxTurnsSupport(...)
Line 2992: func (c *Compiler) validateWebSearchSupport(...)

Recommendation: Break down compiler.go into focused modules:

  • compiler_core.go - Core compilation logic
  • compiler_yaml.go - YAML generation (already has generateYAML, etc.)
  • compiler_jobs.go - Job building (buildJobs, buildMainJob, etc.)
  • compiler_steps.go - Step generation (generateMainJobSteps, etc.)
  • Move validation functions to validation.go
  • Move config parsing to dedicated config files

Estimated Impact: Major improvement in maintainability, testing, and code navigation


High Priority: claude_engine.go - 1312 Lines

File: pkg/workflow/claude_engine.go
Size: 1,312 lines
Issue: Large engine implementation file

Recommendation: Consider extracting:

  • Tool parsing logic to claude_tools.go
  • MCP config rendering to claude_mcp.go
  • Log parsing to claude_logs.go

Estimated Impact: Improved organization of engine-specific logic


High Priority: logs.go - 2785 Lines

File: pkg/cli/logs.go
Size: 2,785 lines
Issue: Handles multiple log formats and parsing strategies

Recommendation: Split into:

  • logs_core.go - Main log command logic
  • logs_parsing.go - Log parsing functions (parseAgentLog, parseFirewallLogs, etc.)
  • logs_formatting.go - Formatting utilities (formatDuration, formatNumber, etc.)

Estimated Impact: Better organization of log handling code


2. 🟡 Outlier Functions (Functions in Wrong Files)

Issue: Functions that don't match their file's primary purpose

Example 1: Validation in Compiler File

  • File: pkg/workflow/compiler.go
  • Functions:
    • validateHTTPTransportSupport() (line 2949)
    • validateMaxTurnsSupport() (line 2968)
    • validateWebSearchSupport() (line 2992)
  • Issue: Validation functions in compiler file
  • Correct Location: pkg/workflow/validation.go
  • Impact: Breaks separation of concerns

Code Reference:

// compiler.go:2949
func (c *Compiler) validateHTTPTransportSupport(tools map[string]any, engine CodingAgentEngine) error { ... }

// Should be in validation.go with other validation functions like:
// - validateExpressionSizes
// - validateContainerImages
// - validateRuntimePackages

Recommendation: Move these 3 validation methods to validation.go


3. 🟢 Well-Organized Patterns (✓ Good Examples)

Pattern: create_*.go files - Each creation function has its own file

These files follow excellent organization principles:

File Purpose Functions
create_issue.go Issue creation parseIssuesConfig, buildCreateOutputIssueJob
create_pull_request.go PR creation parsePullRequestsConfig, buildCreateOutputPullRequestJob
create_discussion.go Discussion creation parseDiscussionsConfig, buildCreateOutputDiscussionJob
create_code_scanning_alert.go Alert creation parseCodeScanningAlertsConfig, buildCreateOutputCodeScanningAlertJob
create_pr_review_comment.go Review comment parsePullRequestReviewCommentsConfig, buildCreateOutputPullRequestReviewCommentJob
create_agent_task.go Agent task parseAgentTaskConfig, buildCreateOutputAgentTaskJob

Analysis: Well-organized - each creation feature has its own file ✓
Pattern: Each file contains:

  • Config parsing function (parse*Config)
  • Job building function (buildCreateOutput*Job)

This is an exemplary pattern that should be followed elsewhere.


4. 🟡 Scattered Helper Functions

Issue: Multiple helper/utility files without clear distinction

Files Found:

  • pkg/cli/shared_utils.go - Shared utilities
  • pkg/cli/frontmatter_utils.go - Frontmatter utilities
  • pkg/cli/repeat_utils.go - Retry/repeat logic
  • pkg/workflow/engine_helpers.go - Engine helpers
  • pkg/workflow/prompt_step_helper.go - Prompt helpers
  • pkg/workflow/safe_output_helpers.go - Safe output helpers

Analysis:

  • Some overlap in naming and purpose
  • Not always clear which helper file to use
  • Could benefit from consolidation or clearer naming

Recommendation:

  • Consider consolidating CLI helpers into fewer, more focused files
  • Consider renaming for clarity (e.g., cli_git_helpers.go, cli_formatting_helpers.go)
  • Document the purpose of each helper file

Estimated Impact: Easier discoverability, reduced confusion


Detailed Function Clusters

Cluster 1: Parse Functions (parse*)

Pattern: Functions with parse prefix for parsing configurations and data

Count: 47+ parse functions identified

Subclusters:

Config Parsing (parse*Config)

  • parseIssuesConfig() - create_issue.go
  • parsePullRequestsConfig() - create_pull_request.go
  • parseDiscussionsConfig() - create_discussion.go
  • parseCommentsConfig() - add_comment.go
  • parsePullRequestReviewCommentsConfig() - create_pr_review_comment.go
  • parseCodeScanningAlertsConfig() - create_code_scanning_alert.go
  • parseSafeJobsConfig() - safe_jobs.go
  • parseThreatDetectionConfig() - threat_detection.go
  • parseAgentTaskConfig() - create_agent_task.go
  • parseUpdateIssuesConfig() - update_issue.go
  • parsePushToPullRequestBranchConfig() - push_to_pull_request_branch.go
  • parseMissingToolConfig() - missing_tool.go

Analysis: Strong, consistent pattern across all create_* and add_* features ✓

Package Parsing

  • parseNpmPackage() - dependabot.go
  • parsePipPackage() - dependabot.go
  • parseGoPackage() - dependabot.go

Analysis: Well-organized in dependabot.go ✓

Tool Parsing

  • parseGitHubTool() - tools_types.go
  • parseBashTool() - tools_types.go
  • parsePlaywrightTool() - tools_types.go
  • parseWebFetchTool() - tools_types.go
  • parseWebSearchTool() - tools_types.go
  • parseEditTool() - tools_types.go
  • parseAgenticWorkflowsTool() - tools_types.go
  • parseCacheMemoryTool() - tools_types.go
  • parseSafetyPromptTool() - tools_types.go
  • parseTimeoutTool() - tools_types.go
  • parseStartupTimeoutTool() - tools_types.go

Analysis: Excellent organization in tools_types.go ✓

Other Parsing Functions

  • parseTimeDelta() - time_delta.go ✓
  • parseAbsoluteDateTime() - time_delta.go ✓
  • parseRelativeDate() - time_delta.go ✓
  • Various CLI parse functions (parseRepoSpec, parseWorkflowSpec, etc.) in spec.go ✓

Overall Assessment: Parse functions are generally well-organized by feature/domain


Cluster 2: Validate Functions (validate*)

Pattern: Functions with validate prefix for validation

Count: 28+ validate functions identified

Location Distribution:

In validation.go (✓ Correct)

  • validateExpressionSizes()
  • validateContainerImages()
  • validateRuntimePackages()
  • validateGitHubActionsSchema()
  • validateNoDuplicateCacheIDs()
  • validateSecretReferences()
  • validateRepositoryFeatures()

In compiler.go (⚠️ Should move)

  • validateHTTPTransportSupport()Should be in validation.go
  • validateMaxTurnsSupport()Should be in validation.go
  • validateWebSearchSupport()Should be in validation.go

In engine.go

  • validateEngine()
  • validateSingleEngineSpecification()

In strict_mode.go

  • validateStrictMode()
  • validateStrictPermissions()
  • validateStrictNetwork()
  • validateStrictMCPNetwork()
  • validateStrictBashTools()

In Other Files

  • validateDockerImage() - docker.go
  • validateStringProperty() - mcp-config.go
  • validateMCPRequirements() - mcp-config.go
  • Package validation in pip.go, npm.go

Analysis: Generally well-organized, but the 3 validation functions in compiler.go are outliers

Recommendation: Move compiler.go validation functions to validation.go


Cluster 3: Generate Functions (generate*)

Pattern: Functions with generate prefix for generating workflow components

Count: 45+ generate functions identified

Subclusters:

YAML/Job Generation (in compiler.go)

  • generateYAML()
  • generateJobName()
  • generateMainJobSteps()
  • generatePrompt()
  • generatePostSteps()
  • generateEngineExecutionSteps()
  • generateOutputCollectionStep()

Step Generation for Uploads/Logs

  • generateUploadAgentLogs()
  • generateUploadAssets()
  • generateUploadAwInfo()
  • generateUploadPrompt()
  • generateUploadAccessLogs()
  • generateUploadMCPLogs()
  • generateLogParsing()
  • generateErrorValidation()

Prompt Steps

  • generateCacheMemoryPromptStep()
  • generateSafeOutputsPromptStep()
  • generateStaticPromptStep() - prompt_step_helper.go
  • generatePlaywrightPromptStep()
  • generateTempFolderPromptStep()
  • generateEditToolPromptStep()
  • generateGitHubContextPromptStep()
  • generateXPIAPromptStep()
  • generatePRContextPromptStep()

Package/Config Generation (in dependabot.go)

  • generatePackageJSON()
  • generatePackageLock()
  • generateDependabotConfig()
  • generateRequirementsTxt()
  • generateGoMod()

Other Generation

  • generateCacheSteps() - cache.go
  • generateCacheMemorySteps() - cache.go
  • generateSafeOutputsConfig() - safe_output_helpers.go
  • Various Copilot-specific generation functions

Analysis: Many generate functions in compiler.go - this contributes to its large size

Recommendation: Consider extracting generate functions into compiler_generators.go or similar


Cluster 4: Build Functions (build*)

Pattern: Functions with build prefix for building jobs and steps

Count: 35+ build functions identified

Subclusters:

Job Building (in compiler.go)

  • buildJobs()
  • buildMainJob()
  • buildSafeOutputsJobs()
  • buildPreActivationJob()
  • buildActivationJob()
  • buildCustomJobs()

Safe Output Job Building

  • buildCreateOutputIssueJob() - create_issue.go ✓
  • buildCreateOutputPullRequestJob() - create_pull_request.go ✓
  • buildCreateOutputDiscussionJob() - create_discussion.go ✓
  • buildCreateOutputAddCommentJob() - add_comment.go ✓
  • buildCreateOutputCodeScanningAlertJob() - create_code_scanning_alert.go ✓
  • buildCreateOutputPullRequestReviewCommentJob() - create_pr_review_comment.go ✓
  • buildCreateOutputAgentTaskJob() - create_agent_task.go ✓
  • buildCreateOutputUpdateIssueJob() - update_issue.go ✓
  • buildCreateOutputMissingToolJob() - missing_tool.go ✓
  • buildCreateOutputPushToPullRequestBranchJob() - push_to_pull_request_branch.go ✓
  • buildAddLabelsJob() - add_labels.go ✓
  • buildUploadAssetsJob() - publish_assets.go ✓
  • buildSafeJobs() - safe_jobs.go ✓

Analysis: Excellent pattern - each safe output job builder is in its own file ✓

Threat Detection Building (in threat_detection.go)

  • buildThreatDetectionJob()
  • buildThreatDetectionSteps()
  • buildEngineSteps()
  • buildParsingStep()
  • buildWorkflowContextEnvVars()
  • Many more threat detection build functions

Helper Build Functions

  • buildEventAwareCommandCondition() - command.go
  • buildArtifactDownloadSteps() - artifacts.go
  • buildAgentOutputDownloadSteps() - safe_output_helpers.go
  • buildConditionTree() - expressions.go
  • buildConcurrencyGroupKeys() - concurrency.go

Analysis: Build functions follow the create_* pattern well ✓


Cluster 5: Format/Render Functions

Pattern: Functions for formatting and rendering

Functions Identified:

  • FormatStepWithCommandAndEnv() - engine_helpers.go
  • FormatJavaScriptForYAML() - js.go
  • formatSafeOutputsRunsOn() - safe_outputs.go
  • formatStringAsJavaScriptLiteral() - threat_detection.go
  • formatDuration() - logs.go (CLI)
  • formatNumber() - logs.go (CLI)
  • Various render functions (renderMCPFetchServerConfig, etc.)

Analysis: Scattered across multiple files, could be better organized

Recommendation: Consider consolidating formatting utilities


Refactoring Recommendations

Priority 1: High Impact - Critical Improvements

1.1 Move Validation Functions to validation.go

Effort: 30 minutes
Impact: High - Fixes clear violation of file organization

Actions:

  • Move validateHTTPTransportSupport() from compiler.go:2949 to validation.go
  • Move validateMaxTurnsSupport() from compiler.go:2968 to validation.go
  • Move validateWebSearchSupport() from compiler.go:2992 to validation.go
  • Update any imports if needed
  • Run tests to verify no breaks

Benefits:

  • Clear separation of concerns
  • All validation logic in one place
  • Easier to find and maintain validation functions

1.2 Decompose compiler.go (3030 Lines → Multiple Files)

Effort: 8-16 hours
Impact: Very High - Major improvement in maintainability

Proposed File Structure:

pkg/workflow/
  compiler.go              (core compilation logic, ~500-800 lines)
  compiler_config.go       (config parsing, move parse* methods)
  compiler_jobs.go         (job building, move build* methods)
  compiler_steps.go        (step generation, move generate*Step methods)
  compiler_yaml.go         (YAML generation, move generateYAML and related)
  compiler_safe_outputs.go (safe outputs logic)

Migration Strategy:

  1. Create new files
  2. Move related methods to appropriate files
  3. Update tests
  4. Verify no functionality breaks

Benefits:

  • Each file has single, clear purpose
  • Easier to navigate codebase
  • Faster to find relevant code
  • Better for code review
  • Easier to test individual components

Priority 2: Medium Impact - Structural Improvements

2.1 Decompose Large Engine Files

Files:

  • claude_engine.go (1312 lines)
  • copilot_engine.go (996 lines)

Effort: 4-6 hours per engine
Impact: Medium - Improved organization

Recommendation:

  • Extract tool parsing to <engine>_tools.go
  • Extract MCP config to <engine>_mcp.go
  • Extract log parsing to <engine>_logs.go

2.2 Decompose logs.go (CLI)

File: `pkg/cl
[Content truncated due to length]

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions