Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Recommendations #12564

@github-actions

Description

@github-actions

Semantic analysis of 490 Go source files in repository: githubnext/gh-aw

Executive Summary

A comprehensive semantic function clustering analysis has identified significant code organization patterns, outlier functions in wrong files, and duplicate implementations across the codebase. The analysis covered 490 non-test Go files across key packages, with deep focus on:

  • pkg/workflow (250 files) - Workflow compilation and processing
  • pkg/cli (163 files) - Command-line interface
  • pkg/parser (30 files) - Parsing utilities
  • pkg/campaign (14 files) - Campaign orchestration

Key Findings:

  • 60-70% well-organized - Many files follow single-responsibility principles
  • ⚠️ ~25% mixed concerns - Files mixing 2-3 different responsibilities
  • ~10% major issues - Files with significant organizational problems
  • 🔄 2 exact duplicates found - Functions with identical implementations
  • 📦 Multiple scattered utilities - Helper functions spread across files

Critical Issues Identified

1. Exact Duplicate Functions

Issue #1: extractBaseRepo() - Identical Implementation in Two Files

Duplicate Locations:

  • pkg/workflow/action_resolver.go:93
  • pkg/cli/update_actions.go:20

Code Comparison:

// pkg/workflow/action_resolver.go:93
func extractBaseRepo(repo string) string {
    parts := strings.Split(repo, "/")
    if len(parts) >= 2 {
        // Take first two parts (owner/repo)
        return parts[0] + "/" + parts[1]
    }
    return repo
}

// pkg/cli/update_actions.go:20
func extractBaseRepo(actionPath string) string {
    parts := strings.Split(actionPath, "/")
    if len(parts) >= 2 {
        // Return owner/repo (first two segments)
        return parts[0] + "/" + parts[1]
    }
    // If less than 2 parts, return as-is (shouldn't happen in practice)
    return actionPath
}

Similarity: 100% identical logic, only comments differ

Recommendation:

  • Consolidate into pkg/repoutil/repoutil.go (which already has related utilities)
  • Export as ExtractBaseRepo(path string) string
  • Update both callers to use repoutil.ExtractBaseRepo()

Estimated Impact: Reduced code duplication, single source of truth for repository path parsing


Issue #2: ParseGitHubURL() - Similar Functions with Different Implementations

Duplicate Locations:

  • pkg/repoutil/repoutil.go:28 - Returns (owner, repo string, err error)
  • pkg/parser/github_urls.go:56 - Returns (*GitHubURLComponents, error)

Implementations:

  • repoutil version: Handles SSH (git@github.com:) and HTTPS formats, returns simple owner/repo tuple
  • parser version: Uses url.Parse(), handles raw.githubusercontent.com, returns structured GitHubURLComponents with file paths, refs, etc.

Analysis:
These functions have different purposes:

  • repoutil.ParseGitHubURL - Simple owner/repo extraction for git operations
  • parser.ParseGitHubURL - Comprehensive URL parsing with file paths, refs, and content types

Recommendation:

  • Rename for clarity:
    • repoutil.ParseGitHubURLrepoutil.ParseGitRepoURL (emphasizes git repo focus)
    • Keep parser.ParseGitHubURL as-is (comprehensive parser)
  • Add cross-reference comments explaining the distinction
  • Consider consolidation: Could repoutil call parser version and extract owner/repo?

Estimated Impact: Improved API clarity, reduced naming confusion


2. Outlier Functions (Functions in Wrong Files)

Priority 1: High-Impact Misplacements

Outlier #1: Git Attribute Configuration in Git Operations File

File: pkg/cli/git.go:157
Function: ensureGitAttributes()
Current Purpose: Configuring .gitattributes for compiled workflow files
Issue: This is compilation post-processing, not a core git operation

Used By:

  • compile_helpers.go
  • compile_orchestration.go
  • add_command.go

Why It's Misplaced:
The function sets up .gitattributes to handle .md and .yml files in the .github/workflows directory. This is a workflow compilation concern, not a generic git utility. The git.go file should contain reusable git operations (commits, branches, remotes), not workflow-specific configuration.

Recommendation:

  • Move to pkg/cli/compile_post_processing.go (or create it)
  • Rename to configureWorkflowGitAttributes() for clarity
  • Keep git.go focused on reusable git operations

Estimated Impact: Clearer separation of concerns, easier to find compilation-related setup


Outlier #2: User Interaction in Git Operations File

File: pkg/cli/git.go:550
Function: confirmPushOperation()
Issue: User interaction logic mixed with git operations

Why It's Misplaced:
This function uses huh library to prompt users for confirmation. User interaction should be grouped with other interactive prompts, not embedded in git operations.

Recommendation:

  • Move to pkg/cli/interactive.go (if exists) or create pkg/cli/prompts.go
  • Group with other user confirmation functions
  • Keep git.go focused on actual git commands

Estimated Impact: Improved testability (can mock prompts separately from git operations)


Outlier #3: Validation in Compilation Config File

File: pkg/cli/compile_validation.go
Function: validateCompileConfig()
Issue: Validates CLI flags (dependabot, purge, workflowDir), not workflow content

Why It's Misplaced:
The file name suggests it validates workflow compilation, but this function validates CLI configuration flags. These are different concerns:

  • Workflow validation: Checking workflow YAML structure, frontmatter, safe outputs
  • Config validation: Checking CLI flag combinations and values

Recommendation:

  • Move to pkg/cli/compile_config.go near the CompileConfig struct definition
  • Keep compile_validation.go for actual workflow content validation
  • Consider renaming to validateCompileFlags() for clarity

Estimated Impact: Clearer distinction between config validation and workflow validation


Outlier #4: Generic YAML Validation in Command File

File: pkg/cli/actions_build_command.go:22
Function: validateActionYml()
Issue: Generic YAML validation in a command-specific file

Why It's Misplaced:
This appears to be a reusable validation utility, but it's buried in the actions_build_command.go file. Generic validators should be centralized for reuse across commands.

Recommendation:

  • Move to pkg/cli/validators.go (which already has ValidateWorkflowName() and ValidateWorkflowIntent())
  • Alternative: Create pkg/cli/actions_validators.go if it's truly actions-specific
  • Export if other commands might need it

Estimated Impact: Improved discoverability and reusability


3. Files with Mixed Concerns

Priority 1: Largest Impact Files

Mixed Concern #1: pkg/cli/git.go (756 lines, 27 functions)

Concerns Mixed:

  1. Git repository detection (findGitRoot, isGitRepo)
  2. GitHub URL parsing (parseGitHubRepoSlugFromURL)
  3. Git attributes configuration (ensureGitAttributes) - MISPLACED
  4. Branch operations (getCurrentBranch, checkOnDefaultBranch)
  5. Remote operations (hasRemote, pushToRemote)
  6. Commit operations (commitChanges, hasChangesToCommit)
  7. User interaction (confirmPushOperation) - MISPLACED

Refactoring Recommendation:
Split into focused files:

  • git_repository.go - Repository detection and navigation (findGitRoot, isGitRepo)
  • git_operations.go - Commits, branches, remotes (core git commands)
  • git_github.go - GitHub-specific URL parsing
  • Move ensureGitAttributes to compile_post_processing.go
  • Move confirmPushOperation to interactive.go or prompts.go

Estimated Effort: 4-6 hours
Benefits: Improved maintainability, easier testing, clearer boundaries


Mixed Concern #2: pkg/workflow/compiler_safe_outputs.go (150+ lines)

Concerns Mixed:

  1. Parsing on: section for triggers
  2. Reaction handling configuration
  3. Lock-for-agent configuration
  4. Event filtering logic

Problem Function:
parseOnSection() handles 4+ different configuration aspects in a single 130-line function.

Refactoring Recommendation:

  • Extract to separate parsers:
    • parseOnTriggers() - Trigger parsing
    • parseReactionConfig() - Reaction handling
    • parseLockForAgent() - Lock configuration
    • parseEventFilters() - Event filtering
  • Keep parseOnSection() as orchestrator calling these focused parsers
  • Consider moving to safe_outputs_on_parser.go

Estimated Effort: 3-4 hours
Benefits: Easier to test individual parsing concerns, reduced complexity


Mixed Concern #3: pkg/workflow/domains.go (570 lines, 20+ functions)

Concerns Mixed:

  1. Ecosystem domain lookups (getEcosystemDomains())
  2. Runtime domain extraction (getDomainsFromRuntimes())
  3. Engine-specific domain builders (GetCopilotAllowedDomains(), GetCodexAllowedDomains(), GetClaudeAllowedDomains())
  4. HTTP MCP domain extraction (extractHTTPMCPDomains())
  5. Domain merging functions (multiple variants)

Refactoring Recommendation:
Split into focused files:

  • domains_ecosystem.go - Ecosystem domain lookups (npm, pip, docker)
  • domains_engines.go - Engine-specific domain configuration (Copilot, Codex, Claude)
  • domains_merging.go - Domain combination and deduplication logic
  • domains_mcp.go - HTTP MCP domain extraction

Estimated Effort: 4-5 hours
Benefits: Clearer organization, easier to find engine-specific logic


Mixed Concern #4: pkg/workflow/compiler_jobs.go

Concerns Mixed:

  1. Logic/decision functions (isActivationJobNeeded(), shouldAddCheckoutStep())
  2. Analysis functions (referencesCustomJobOutputs())
  3. Dependency checking (jobDependsOnPreActivation())
  4. Building functions (buildJobs())
  5. Parsing functions (extractJobsFromFrontmatter()) - MISPLACED

Refactoring Recommendation:

  • Move extractJobsFromFrontmatter() to compiler_frontmatter.go or jobs_parser.go
  • Keep analysis functions (referencesCustomJobOutputs, jobDependsOnPreActivation) in jobs_analysis.go
  • Keep building functions in jobs_builder.go
  • Keep decision logic in jobs_config.go

Estimated Effort: 3-4 hours
Benefits: Separation of parsing, analysis, and construction


Mixed Concern #5: pkg/cli/add_command.go (609 lines, 5 functions)

Issue:
Mixes command definition with complex workflow addition logic. The function addWorkflowWithTracking() is 260+ lines and handles the entire workflow addition flow.

Refactoring Recommendation:

  • Keep command setup in add_command.go (flag parsing, command registration)
  • Extract addWorkflowWithTracking() logic to add_workflow_operations.go
  • Split into smaller functions:
    • prepareWorkflowAddition() - Validation and preparation
    • executeWorkflowAddition() - Core addition logic
    • finalizeWorkflowAddition() - Post-processing and cleanup

Estimated Effort: 4-6 hours
Benefits: Clearer command structure, easier to test business logic separately


4. Scattered Helper Functions

Utility Functions Spread Across Multiple Files

Scattered #1: YAML/Frontmatter Utilities

Current Locations:

  • pkg/cli/codemod_yaml_utils.go: reconstructContent(), parseFrontmatterLines(), getIndentation(), isTopLevelKey(), removeFieldFromBlock()
  • pkg/cli/frontmatter_editor.go: UpdateFieldInFrontmatter(), RemoveFieldFromOnTrigger(), SetFieldInOnTrigger()
  • pkg/cli/add_workflow_compilation.go: addSourceToWorkflow()

Issue:
Frontmatter manipulation utilities are split across 3 files with overlapping concerns.

Recommendation:
Consolidate into unified frontmatter_utils.go:

Section 1: Parsing

  • parseFrontmatterLines()
  • extractFrontmatter()

Section 2: Field Manipulation

  • UpdateFieldInFrontmatter()
  • RemoveFieldFromOnTrigger()
  • SetFieldInOnTrigger()
  • addSourceToWorkflow()

Section 3: Content Reconstruction

  • reconstructContent()
  • getIndentation()
  • isTopLevelKey()
  • removeFieldFromBlock()

Estimated Effort: 2-3 hours
Benefits: Single source for frontmatter operations, easier discoverability


Scattered #2: String Processing Utilities

Current Locations:

  • pkg/workflow/strings.go - Main string utilities
  • pkg/workflow/workflow_name.go - SanitizeIdentifier() (creates safe identifiers)
  • pkg/workflow/config_helpers.go - Various string parsing helpers

Issue:
The SanitizeIdentifier() function is separate from other string utilities, even though strings.go mentions it in comments.

Recommendation:

  • Keep specialized functions in their current locations (they're fine)
  • Add clear cross-references in comments:
    • In strings.go: "See also: SanitizeIdentifier() in workflow_name.go for identifier creation"
    • In workflow_name.go: "For general string normalization, see strings.go"

Estimated Effort: 30 minutes
Benefits: Improved discoverability through better documentation


Scattered #3: Config Parsing Helpers

Current Locations:

  • pkg/workflow/config_helpers.go - 12 functions (parse*, Parse*, extract*)
  • pkg/workflow/map_helpers.go - 2 functions (parseIntValue, filterMapKeys)
  • pkg/workflow/tools_parser.go - Tool-specific parsing
  • pkg/workflow/safe_outputs_config.go - Safe outputs parsing

Issue:
Config parsing utilities spread across 5+ files, making it hard to find the right helper.

Recommendation:

  • Keep domain-specific parsers in their domain files (tools_parser.go, safe_outputs_config.go)
  • Consolidate generic helpers:
    • Move parseIntValue from map_helpers.go to config_helpers.go
    • Rename map_helpers.go to map_utils.go if it only has map manipulation (no parsing)
    • Add header comments in config_helpers.go explaining what types of helpers belong there

Estimated Effort: 1-2 hours
Benefits: Centralized generic parsing utilities, clearer file purposes


Well-Organized Code (Models to Follow)

Excellent Examples of Code Organization

🏆 Best Practice #1: codemod_* Files (pkg/cli)

Why It's Excellent:

  • 13 feature-specific files following identical patterns
  • Shared utilities properly factored out (codemod_yaml_utils.go)
  • Consistent structure:
    func get{Feature}Codemod() Codemod {
        return Codemod{
            ID:           "feature-identifier",
            Name:         "Human readable name",
            Description:  "What it does",
            IntroducedIn: "0.x.0",
            Apply: func(content string, frontmatter map[string]any) (string, bool, error) {
                // Implementation
            },
        }
    }
  • Each file handles ONE migration concern
  • Paired with test files

Files:

  • codemod_agent_session.go, codemod_discussion_flag.go, codemod_grep_tool.go, codemod_mcp_mode_to_type.go, codemod_mcp_network.go, codemod_network_firewall.go, codemod_permissions.go, codemod_safe_inputs.go, codemod_sandbox_agent.go, codemod_schedule.go, codemod_schema_file.go, codemod_slash_command.go, codemod_timeout_minutes.go, codemod_upload_assets.go

Key Takeaway: This is a model subsystem that demonstrates perfect feature-based organization.


🏆 Best Practice #2: Runtime Domain Organization (pkg/workflow)

Why It's Excellent:

  • Clear separation by concern:
    • runtime_definitions.go - Type definitions
    • runtime_detection.go - Runtime detection logic
    • runtime_deduplication.go - Deduplication
    • runtime_validation.go - Validation
  • Each file has a single, clear purpose
  • Easy to find related functionality

Key Takeaway: Split by functional concern (definitions, detection, validation) rather than mixing in one large file.


🏆 Best Practice #3: Validation Files (pkg/workflow)

Why It's Excellent:

  • 17+ focused validation files, each handling one domain:
    • pip_validation.go - Python package validation
    • npm_validation.go - NPM package validation
    • docker_validation.go - Docker image validation
    • firewall_validation.go - Firewall configuration
    • expression_validation.go - Expression safety
    • sandbox_validation.go - Sandbox configuration
    • And 11 more...
  • Generic validators in validation_helpers.go (ValidateRequired(), ValidateMaxLength())
  • Easy to add new validators (just create new {domain}_validation.go)

Key Takeaway: One validation file per domain prevents god files.


🏆 Best Practice #4: Expression Handling (pkg/workflow)

Why It's Excellent:

  • Well-separated by concern:
    • expression_parser.go - Parsing
    • expression_validation.go - Validation
    • expression_extraction.go - Extraction
    • expression_builder.go - Building
    • expression_patterns.go - Pattern matching
  • Each file focuses on one operation on expressions
  • Easy to navigate (parser → validator → builder flow)

Key Takeaway: Organize by operation type when dealing with a core concept.


🏆 Best Practice #5: frontmatter_editor.go (pkg/cli)

Why It's Excellent:

  • Clear header documentation explaining the file's purpose and scope
  • Functions: UpdateFieldInFrontmatter(), RemoveFieldFromOnTrigger(), SetFieldInOnTrigger()
  • Single responsibility: YAML frontmatter manipulation
  • Well-tested with comprehensive test file

Key Takeaway: Header comments explaining organization rationale help maintainers understand boundaries.


Detailed Function Clusters

Semantic Clustering Analysis by Pattern

Cluster 1: Creation Functions (create*)

Pattern: Functions that create new entities
Files: 13+ files in pkg/workflow

Well-Organized Examples:

  • create_issue.go - GitHub issue creation
  • create_pull_request.go - Pull request creation
  • create_discussion.go - Discussion creation

Analysis: ✅ Excellent organization - each creation function has its own file


Cluster 2: Parsing Functions (parse*, Parse*)

Pattern: Functions that parse strings, YAML, or configurations
Files: 22+ files across pkg/workflow and pkg/cli

Well-Organized Examples:

  • trigger_parser.go - 16 functions for trigger parsing
  • tools_parser.go - 13 functions for tool configuration parsing
  • slash_command_parser.go - Slash command parsing

Scattered Examples:

  • config_helpers.go - 9 parsing functions mixed with other helpers
  • safe_outputs_config.go - 1 massive 150-line parsing function (should be split)
  • compiler_jobs.go - extractJobsFromFrontmatter() (should move to parser file)

Recommendation:

  • Keep domain-specific parsers in dedicated *_parser.go files
  • Consolidate generic parsing helpers into config_helpers.go or parsing_utils.go

Cluster 3: Validation Functions (validate*, Validate*)

Pattern: Functions that validate configurations, inputs, or workflows
Files: 21+ validation files

Well-Organized Examples:

  • pip_validation.go, npm_validation.go, docker_validation.go - Domain-specific validators
  • validation_helpers.go - Generic validators (ValidateRequired(), ValidateMaxLength())
  • strict_mode_validation.go - 7 expression safety validators

Analysis: ✅ Excellent organization - validation is well-separated by domain


Cluster 4: Building/Generation Functions (build*, generate*, Render*)

Pattern: Functions that construct objects, generate output, or render templates
Files: 20+ files

Examples:

  • expression_builder.go - 26 functions for building expression trees
  • mcp_renderer.go - 14 functions for rendering MCP configurations
  • safe_inputs_generator.go - Generating safe input configurations

Analysis: ✅ Well-organized, clear separation of building concerns


Cluster 5: Helper Functions (lowercase patterns)

Common Patterns: ensure*, get*, is*, has*, check*, find*, extract*
Occurrences: 574+ functions across 154 files

Well-Consolidated Examples:

  • strings.go - String normalization utilities
  • git_helpers.go - Git repository utilities
  • error_helpers.go - Error construction helpers
  • validation_helpers.go - Generic validators

Scattered Examples:

  • String processing helpers in 3+ files
  • Config parsing helpers in 5+ files
  • YAML manipulation in 3+ files

Recommendation: Continue consolidating helpers into focused utility files


Implementation Priorities

Priority 1: High-Impact, Quick Wins (1-2 days)

  1. Consolidate Exact Duplicates

    • Merge extractBaseRepo() into pkg/repoutil/repoutil.go
    • Rename ParseGitHubURL variants for clarity
    • Effort: 2-3 hours
    • Impact: Immediate code deduplication
  2. Move Outlier Functions

    • Move ensureGitAttributes() to compile_post_processing.go
    • Move confirmPushOperation() to interactive.go
    • Move validateCompileConfig() to compile_config.go
    • Move validateActionYml() to validators.go
    • Effort: 3-4 hours
    • Impact: Clearer file boundaries, improved discoverability
  3. Split pkg/cli/git.go

    • Create git_repository.go, git_operations.go, git_github.go
    • Effort: 4-6 hours
    • Impact: Major maintainability improvement

Priority 2: Medium Impact (3-5 days)

  1. Split Large Mixed-Concern Files

    • pkg/workflow/domains.go → 3-4 focused files
    • pkg/workflow/compiler_safe_outputs.go → Extract parsing functions
    • pkg/workflow/compiler_jobs.go → Separate parsing, analysis, building
    • Effort: 8-12 hours
    • Impact: Reduced complexity, easier testing
  2. Consolidate Scattered Utilities

    • Centralize YAML/frontmatter utilities
    • Group config parsing helpers
    • Effort: 4-6 hours
    • Impact: Improved discoverability, reduced scatter

Priority 3: Long-Term Improvements (1-2 weeks)

  1. Extract Complex Functions

    • Refactor parseOnSection() in compiler_safe_outputs.go
    • Refactor extractSafeOutputsConfig() in safe_outputs_config.go
    • Effort: 6-8 hours
    • Impact: Reduced function complexity, better testability
  2. Documentation Improvements

    • Add header comments explaining file organization (like frontmatter_editor.go)
    • Add cross-references for related utilities
    • Effort: 2-3 hours
    • Impact: Improved maintainability

Implementation Guidelines

For All Refactorings:

  1. Preserve Behavior - Ensure existing functionality works identically
  2. Maintain Exports - Keep public API unchanged (unless renaming for clarity)
  3. Write Tests First - Add tests before refactoring (especially for untested code)
  4. Incremental Changes - Split one file or move one function at a time
  5. Run Tests Frequently - Verify tests pass after each change
  6. Update Imports - Ensure all import paths are updated
  7. Add Documentation - Explain boundaries with header comments

Testing Strategy:

  • Use table-driven tests for validation/parsing logic
  • Mock external dependencies (git commands, GitHub API)
  • Aim for ≥80% coverage for refactored code
  • Verify integration tests still pass

Metrics Summary

  • Total Go Files Analyzed: 490 non-test files
  • Major Packages:
    • pkg/workflow: 250 files
    • pkg/cli: 163 files
    • pkg/parser: 30 files
    • pkg/campaign: 14 files
  • Function Clusters Identified: 5 major clusters (creation, parsing, validation, building, helpers)
  • Exact Duplicates Detected: 2 functions (extractBaseRepo, ParseGitHubURL variants)
  • Outliers Found: 4 high-priority functions in wrong files
  • Files with Mixed Concerns: 5 files requiring refactoring
  • Well-Organized Subsystems: 5 exemplary patterns (codemod_, validation_, runtime_, expression_)
  • Detection Method: Semantic code analysis + naming pattern analysis + manual code inspection
  • Analysis Date: 2026-01-29

Acceptance Criteria

Refactoring is successful when:

  • Exact duplicate functions are consolidated
  • Outlier functions are moved to appropriate files
  • Mixed-concern files are split by responsibility
  • Scattered utilities are consolidated
  • All tests pass (unit + integration)
  • Code passes linting
  • Build succeeds
  • Public API remains unchanged (except intentional renames)
  • Test coverage maintained or improved

References

AI generated by Semantic Function Refactoring

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions