Skip to content

[refactor] 🔧 Semantic Function Clustering Analysis - Refactoring Opportunities #4184

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Automated analysis of repository: githubnext/gh-aw

Executive Summary

I analyzed 223 non-test Go files across 8 packages in the repository, identifying function clusters, outliers, and code duplication opportunities. The codebase is generally well-organized with dedicated validation and creation function files, but several opportunities exist for improved organization and reduced duplication.

Key Findings:

  • Well-organized: Validation logic (16 files), creation functions (7 files)
  • ⚠️ Found 1 exact duplicate: formatFileSize() function duplicated across 2 files
  • ⚠️ Found 3 similar functions: String sanitization/normalization spread across 3 files
  • Good patterns: Helper functions consolidated in 6 dedicated helper files
  • 📊 Function clusters identified: 8 major semantic clusters
Full Analysis Details

Function Inventory

Package Distribution

Package Files Percentage Primary Purpose
pkg/workflow 132 59% Workflow compilation, validation, engine support
pkg/cli 72 32% CLI commands and user interface
pkg/parser 12 5% Parsing frontmatter, YAML, GitHub URLs
pkg/console 4 2% Console output formatting
pkg/logger 1 <1% Logging utilities
pkg/constants 1 <1% Global constants
pkg/timeutil 1 <1% Time formatting utilities
Total 223 100%

File Organization Patterns

The codebase follows Go best practices with files organized by feature:

Well-Organized Patterns:

  • Validation files: 16 dedicated *_validation.go files
  • Creation files: Each feature has its own create_*.go file
  • Helper files: 6 dedicated *_helper*.go files
  • Engine files: Separate files per engine type (claude, codex, copilot, custom)

Identified Issues

1. 🔴 Exact Duplicate Functions

Issue: formatFileSize() - 100% Duplicate

Description: Identical file size formatting function exists in two different files.

Occurrences:

  1. File: pkg/console/format.go:5-28

    • Function: FormatFileSize(size int64) string (exported)
    • Lines: 24 lines
  2. File: pkg/console/render.go:516-539

    • Function: formatFileSize(size int64) string (unexported)
    • Lines: 24 lines

Similarity: 100% - The function bodies are identical line-by-line.

Code Comparison:

// Both implementations are identical:
func formatFileSize(size int64) string {
    if size == 0 {
        return "0 B"
    }
    const unit = 1024
    if size < unit {
        return fmt.Sprintf("%d B", size)
    }
    // ... exact same logic in both files
}

Recommendation:

  • Remove formatFileSize() from pkg/console/render.go
  • Use the exported FormatFileSize() from pkg/console/format.go everywhere
  • Update all call sites in render.go to use the exported version

Estimated Impact:

  • Reduced code duplication: -24 lines
  • Single source of truth for file size formatting
  • Easier maintenance and testing

2. ⚠️ Similar Functions - String Sanitization/Normalization

Issue: Name sanitization and normalization spread across 3 files

Description: Multiple functions handle string sanitization and normalization with overlapping concerns.

Occurrences:

  1. File: pkg/workflow/strings.go:156-162

    • Function: SanitizeWorkflowName(name string) string
    • Purpose: Sanitizes workflow names, preserves ., _, -
    • Implementation: Wrapper around SanitizeName()
  2. File: pkg/workflow/workflow_name.go:11-17

    • Function: SanitizeIdentifier(name string) string
    • Purpose: Sanitizes identifiers, removes all special chars
    • Implementation: Wrapper around SanitizeName()
  3. File: pkg/workflow/resolve.go:111-130

    • Function: normalizeWorkflowName(name string) string
    • Purpose: Removes file extensions (.lock.yml, .md)
    • Implementation: Independent logic
  4. File: pkg/workflow/safe_outputs.go:515-517

    • Function: normalizeSafeOutputIdentifier(name string) string
    • Purpose: Brief normalization (not examined in detail)

Analysis:

  • SanitizeWorkflowName() and SanitizeIdentifier() both use the same underlying SanitizeName() function with different options ✅ (Good pattern)
  • normalizeWorkflowName() handles a different concern (extension removal) ✅ (Acceptable separation)
  • Functions serve distinct purposes but naming could be clearer

Recommendation:

  • No immediate action required - Functions serve different purposes
  • Future consideration: Document the distinction between "sanitize" (character cleanup) vs "normalize" (extension/format handling)
  • Consider moving all name-related utilities to a single file for easier discoverability

Estimated Impact:

  • Low priority - mainly a documentation/discoverability improvement
  • No code duplication detected

3. 📋 Parsing Configuration Functions - Good Pattern

Issue: Similar parse*Config() functions - Well-Organized ✅

Description: Multiple files have parse*Config() methods on the Compiler type, but they follow a consistent pattern and each is in its appropriate file.

Occurrences:

  • pkg/workflow/create_issue.go: parseIssuesConfig()
  • pkg/workflow/create_discussion.go: parseDiscussionsConfig()
  • pkg/workflow/add_comment.go: parseCommentsConfig()
  • pkg/workflow/update_issue.go: parseUpdateIssuesConfig()
  • pkg/workflow/create_pull_request.go: (implied parsePullRequestsConfig())

Analysis:
✅ Each parse function is co-located with its related functionality
✅ Functions share a common naming pattern
✅ Each handles a specific output type
✅ Helper functions are extracted to config_helpers.go

Recommendation:

  • No action required - This is a good organizational pattern
  • The pattern demonstrates proper separation of concerns

Detailed Function Clusters

Cluster 1: Validation Functions ✅

Pattern: *Validate* functions
Files: 16 dedicated validation files
Organization: Excellent ✅

Validation Files:

  1. agent_validation.go - Agent configuration validation
  2. bundler_validation.go - Script bundler validation
  3. docker_validation.go - Docker configuration validation
  4. engine_validation.go - Engine configuration validation
  5. expression_validation.go - Expression syntax validation
  6. github_toolset_validation_error.go - GitHub toolset errors
  7. mcp_config_validation.go - MCP configuration validation
  8. npm_validation.go - NPM package validation
  9. pip_validation.go - Python package validation
  10. repository_features_validation.go - Repository feature checks
  11. runtime_validation.go - Runtime environment validation
  12. schema_validation.go - YAML schema validation
  13. step_order_validation.go - Step ordering validation
  14. strict_mode_validation.go - Strict mode checks
  15. template_validation.go - Template validation
  16. validation.go - General validation utilities

Analysis:
✅ Validation logic is properly separated by domain
✅ Each file has a clear, single purpose
✅ Naming convention is consistent

Recommendation: Continue this pattern for new validation logic.


Cluster 2: Creation Functions ✅

Pattern: create_* functions
Files: 7 dedicated creation files
Organization: Excellent ✅

Creation Files:

  1. create_agent_task.go - Agent task creation
  2. create_code_scanning_alert.go - Security alert creation
  3. create_discussion.go - Discussion creation
  4. create_issue.go - Issue creation
  5. create_pr_review_comment.go - PR review comment creation
  6. create_pull_request.go - Pull request creation
  7. add_comment.go - Comment addition
  8. add_labels.go - Label addition

Analysis:
✅ Each creation function has its own file
✅ Files named after their primary purpose
✅ Consistent structure across files

Recommendation: Continue this pattern for new creation/modification operations.


Cluster 3: Helper Functions ✅

Pattern: *_helper* or *_helpers files
Files: 6 dedicated helper files
Organization: Good ✅

Helper Files:

  1. config_helpers.go - Configuration parsing helpers
  2. engine_helpers.go - Engine setup and rendering helpers
  3. gh_helper.go - GitHub CLI interaction helpers
  4. map_helpers.go - Map/dictionary utilities
  5. prompt_step_helper.go - Prompt step generation helpers
  6. safe_outputs_env_helpers.go - Safe outputs environment helpers

Analysis:
✅ Helper functions are consolidated into topic-specific files
✅ Each helper file serves a clear domain
✅ Better than scattering helpers across many files

Recommendation: Continue consolidating related helper functions.


Cluster 4: Engine Files ✅

Pattern: *_engine.go files
Files: 5 engine files
Organization: Excellent ✅

Engine Files:

  1. agentic_engine.go - Base agentic engine
  2. claude_engine.go - Claude-specific engine
  3. codex_engine.go - Codex-specific engine
  4. copilot_engine.go - Copilot-specific engine
  5. custom_engine.go - Custom engine support
  6. engine.go - Engine interface and utilities

Analysis:
✅ Each AI engine has its own dedicated file
✅ Common functionality extracted to engine.go
✅ Clear separation of concerns

Recommendation: Continue this pattern for any new engines.


Cluster 5: Extraction Functions

Pattern: extract* or Extract* functions
Files: Multiple files with extraction logic
Organization: Acceptable ⚠️

Key Files:

  1. frontmatter_extraction.go - Extracts frontmatter fields (20 extract methods)
  2. secret_extraction.go - Extracts secrets from values
  3. package_extraction.go - Extracts package names
  4. expression_extraction.go - Extracts expressions
  5. config_helpers.go - Extract config values

Analysis:
✅ Extraction functions grouped by domain (frontmatter, secrets, packages)
⚠️ Some overlap in config value extraction between files

Recommendation:

  • Current organization is acceptable
  • Consider whether config_helpers.go extraction functions could be more clearly separated

Cluster 6: Parsing Functions

Pattern: parse* or Parse* functions
Files: Spread across parser and workflow packages
Organization: Good ✅

Key Patterns:

  • pkg/parser/frontmatter.go - Parse frontmatter from markdown
  • pkg/parser/github_urls.go - Parse GitHub URLs
  • pkg/workflow/compiler.go - ParseWorkflowFile(), parseOnSection()
  • pkg/workflow/expressions.go - Expression parsing
  • pkg/workflow/time_delta.go - Time delta parsing
  • Various parse*Config() methods co-located with their features

Analysis:
✅ Parser package handles generic parsing
✅ Workflow-specific parsing is in workflow package
✅ Config parsing co-located with features

Recommendation: Continue this separation pattern.


Cluster 7: Formatting/Output Functions

Pattern: Format* or format* functions
Files: Primarily in pkg/console/
Organization: Good ✅ (with one duplicate)

Console Formatting Functions:

  • FormatError() - Format compiler errors
  • FormatErrorMessage() - Format error text
  • FormatErrorWithSuggestions() - Format errors with suggestions
  • FormatSuccessMessage() - Format success text
  • FormatInfoMessage() - Format info text
  • FormatWarningMessage() - Format warning text
  • FormatLocationMessage() - Format file locations
  • FormatCommandMessage() - Format commands
  • FormatProgressMessage() - Format progress text
  • FormatPromptMessage() - Format prompts
  • FormatCountMessage() - Format counts
  • FormatVerboseMessage() - Format verbose text
  • FormatListHeader() - Format list headers
  • FormatListItem() - Format list items
  • FormatFileSize() - Format file sizes ⚠️ (duplicate)
  • FormatNumber() - Format numbers
  • FormatDuration() - Format time durations (in pkg/timeutil/)

Analysis:
✅ Most formatting is centralized in console package
⚠️ One duplicate (formatFileSize in render.go)
✅ Consistent naming pattern

Recommendation:

  • Fix the formatFileSize duplicate (see Issue rejig docs #1)
  • Continue centralizing formatting in console package

Cluster 8: Sanitization/Normalization Functions

Pattern: sanitize*, Sanitize*, normalize*
Files: Primarily in pkg/workflow/
Organization: Acceptable ⚠️

Key Functions:

  • strings.go: SanitizeName(), SanitizeWorkflowName()
  • workflow_name.go: SanitizeIdentifier()
  • resolve.go: normalizeWorkflowName()
  • safe_outputs.go: normalizeSafeOutputIdentifier()
  • domain_sanitization.go: computeAllowedDomainsForSanitization()
  • scripts.go: getSanitizeOutputScript()

Analysis:
⚠️ Functions spread across multiple files
✅ Each serves a specific domain purpose
⚠️ Could benefit from clearer organization

Recommendation:

  • Document the distinction between "sanitize" (character cleanup) vs "normalize" (format/extension handling)
  • Consider consolidating string manipulation utilities

Refactoring Recommendations

Priority 1: High Impact (Quick Wins)

1.1 Remove Duplicate formatFileSize() Function

Action: Consolidate duplicate file size formatting
Files affected:

  • pkg/console/render.go (remove local duplicate)
  • Keep: pkg/console/format.go (exported version)

Steps:

  1. Replace internal calls to formatFileSize() in render.go with FormatFileSize()
  2. Remove the duplicate formatFileSize() function from render.go
  3. Verify tests pass

Estimated effort: 30 minutes
Benefits:

  • Remove 24 lines of duplicate code
  • Single source of truth
  • Easier to maintain and test

Priority 2: Medium Impact (Documentation)

2.1 Document String Processing Patterns

Action: Add package-level documentation clarifying naming conventions

Clarifications needed:

  • Sanitize: Character cleanup for valid identifiers
  • Normalize: Format/extension handling and standardization
  • When to use each pattern

Files to document:

  • pkg/workflow/strings.go - Add package comment
  • pkg/workflow/workflow_name.go - Add function comments
  • pkg/workflow/resolve.go - Add function comments

Estimated effort: 1 hour
Benefits:

  • Clearer intent for future developers
  • Consistent usage patterns
  • Reduced confusion

Priority 3: Long-term Improvements (Optional)

3.1 Consider Utility Package Consolidation

Action: Evaluate creating a dedicated utilities package

Potential candidates for consolidation:

  • String processing (SanitizeName, normalization functions)
  • Value extraction (extractStringFromMap, parseIntValue)
  • Common validation helpers

Estimated effort: 4-6 hours
Benefits:

  • Centralized common utilities
  • Easier discovery
  • Reduced import complexity

Note: This is optional and should only be done if the team sees value in it. The current organization is acceptable.


Strengths of Current Organization

Excellent patterns identified:

  1. Validation files - Each validation concern has its own file
  2. Creation files - Each create operation has its own file
  3. Engine files - Each AI engine is properly separated
  4. Helper files - Related helpers are consolidated
  5. Consistent naming - Functions follow clear naming patterns
  6. Domain separation - Parser, workflow, console packages well-defined

Analysis Metadata

  • Total Go Files Analyzed: 223 (non-test files)
  • Total Test Files: ~200+ (not included in analysis)
  • Packages Analyzed: 8 packages
  • Function Clusters Identified: 8 major clusters
  • Exact Duplicates Found: 1 (formatFileSize)
  • Similar Functions Found: 4 (sanitization/normalization)
  • Well-Organized Patterns: 4 (validation, creation, engines, helpers)
  • Detection Method: Serena semantic code analysis + grep pattern analysis
  • Analysis Date: 2025-11-17
  • Repository: githubnext/gh-aw
  • Codebase Language: Go

Implementation Checklist

Immediate Actions

  • Review findings and prioritize refactoring tasks
  • Fix formatFileSize() duplicate (Priority 1)
  • Add documentation for string processing patterns (Priority 2)

Future Considerations

  • Monitor for new duplicates in code reviews
  • Consider utility package consolidation (Priority 3)
  • Maintain validation and creation file patterns
  • Continue consistent naming conventions

Conclusion

The codebase demonstrates strong organizational patterns with well-structured validation, creation, and engine files. The primary issue identified is a single exact duplicate function (formatFileSize) which can be easily resolved. The semantic function clustering revealed that the team follows Go best practices with feature-based file organization.

Overall Assessment: ✅ Well-organized codebase with minimal refactoring needed.


Summary

This analysis identified:

  • Strong organization: Validation, creation, and engine files follow excellent patterns
  • 🔴 1 exact duplicate: formatFileSize() should be consolidated
  • ⚠️ Minor improvements: Documentation for string processing patterns
  • 📊 8 function clusters: All appropriately organized

Recommendation: Focus on the Priority 1 duplicate removal, then document patterns as time permits.

AI generated by Semantic Function Refactoring

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions