-
Notifications
You must be signed in to change notification settings - Fork 32
Description
🔧 Semantic Function Clustering Analysis
Automated analysis of repository: githubnext/gh-aw
Executive Summary
I analyzed 223 non-test Go files across 8 packages in the repository, identifying function clusters, outliers, and code duplication opportunities. The codebase is generally well-organized with dedicated validation and creation function files, but several opportunities exist for improved organization and reduced duplication.
Key Findings:
- ✅ Well-organized: Validation logic (16 files), creation functions (7 files)
⚠️ Found 1 exact duplicate:formatFileSize()function duplicated across 2 files⚠️ Found 3 similar functions: String sanitization/normalization spread across 3 files- ✅ Good patterns: Helper functions consolidated in 6 dedicated helper files
- 📊 Function clusters identified: 8 major semantic clusters
Full Analysis Details
Function Inventory
Package Distribution
| Package | Files | Percentage | Primary Purpose |
|---|---|---|---|
pkg/workflow |
132 | 59% | Workflow compilation, validation, engine support |
pkg/cli |
72 | 32% | CLI commands and user interface |
pkg/parser |
12 | 5% | Parsing frontmatter, YAML, GitHub URLs |
pkg/console |
4 | 2% | Console output formatting |
pkg/logger |
1 | <1% | Logging utilities |
pkg/constants |
1 | <1% | Global constants |
pkg/timeutil |
1 | <1% | Time formatting utilities |
| Total | 223 | 100% |
File Organization Patterns
The codebase follows Go best practices with files organized by feature:
Well-Organized Patterns:
- ✅ Validation files: 16 dedicated
*_validation.gofiles - ✅ Creation files: Each feature has its own
create_*.gofile - ✅ Helper files: 6 dedicated
*_helper*.gofiles - ✅ Engine files: Separate files per engine type (claude, codex, copilot, custom)
Identified Issues
1. 🔴 Exact Duplicate Functions
Issue: formatFileSize() - 100% Duplicate
Description: Identical file size formatting function exists in two different files.
Occurrences:
-
File:
pkg/console/format.go:5-28- Function:
FormatFileSize(size int64) string(exported) - Lines: 24 lines
- Function:
-
File:
pkg/console/render.go:516-539- Function:
formatFileSize(size int64) string(unexported) - Lines: 24 lines
- Function:
Similarity: 100% - The function bodies are identical line-by-line.
Code Comparison:
// Both implementations are identical:
func formatFileSize(size int64) string {
if size == 0 {
return "0 B"
}
const unit = 1024
if size < unit {
return fmt.Sprintf("%d B", size)
}
// ... exact same logic in both files
}Recommendation:
- Remove
formatFileSize()frompkg/console/render.go - Use the exported
FormatFileSize()frompkg/console/format.goeverywhere - Update all call sites in render.go to use the exported version
Estimated Impact:
- Reduced code duplication: -24 lines
- Single source of truth for file size formatting
- Easier maintenance and testing
2. ⚠️ Similar Functions - String Sanitization/Normalization
Issue: Name sanitization and normalization spread across 3 files
Description: Multiple functions handle string sanitization and normalization with overlapping concerns.
Occurrences:
-
File:
pkg/workflow/strings.go:156-162- Function:
SanitizeWorkflowName(name string) string - Purpose: Sanitizes workflow names, preserves
.,_,- - Implementation: Wrapper around
SanitizeName()
- Function:
-
File:
pkg/workflow/workflow_name.go:11-17- Function:
SanitizeIdentifier(name string) string - Purpose: Sanitizes identifiers, removes all special chars
- Implementation: Wrapper around
SanitizeName()
- Function:
-
File:
pkg/workflow/resolve.go:111-130- Function:
normalizeWorkflowName(name string) string - Purpose: Removes file extensions (
.lock.yml,.md) - Implementation: Independent logic
- Function:
-
File:
pkg/workflow/safe_outputs.go:515-517- Function:
normalizeSafeOutputIdentifier(name string) string - Purpose: Brief normalization (not examined in detail)
- Function:
Analysis:
SanitizeWorkflowName()andSanitizeIdentifier()both use the same underlyingSanitizeName()function with different options ✅ (Good pattern)normalizeWorkflowName()handles a different concern (extension removal) ✅ (Acceptable separation)- Functions serve distinct purposes but naming could be clearer
Recommendation:
- No immediate action required - Functions serve different purposes
- Future consideration: Document the distinction between "sanitize" (character cleanup) vs "normalize" (extension/format handling)
- Consider moving all name-related utilities to a single file for easier discoverability
Estimated Impact:
- Low priority - mainly a documentation/discoverability improvement
- No code duplication detected
3. 📋 Parsing Configuration Functions - Good Pattern
Issue: Similar parse*Config() functions - Well-Organized ✅
Description: Multiple files have parse*Config() methods on the Compiler type, but they follow a consistent pattern and each is in its appropriate file.
Occurrences:
pkg/workflow/create_issue.go:parseIssuesConfig()pkg/workflow/create_discussion.go:parseDiscussionsConfig()pkg/workflow/add_comment.go:parseCommentsConfig()pkg/workflow/update_issue.go:parseUpdateIssuesConfig()pkg/workflow/create_pull_request.go: (impliedparsePullRequestsConfig())
Analysis:
✅ Each parse function is co-located with its related functionality
✅ Functions share a common naming pattern
✅ Each handles a specific output type
✅ Helper functions are extracted to config_helpers.go
Recommendation:
- No action required - This is a good organizational pattern
- The pattern demonstrates proper separation of concerns
Detailed Function Clusters
Cluster 1: Validation Functions ✅
Pattern: *Validate* functions
Files: 16 dedicated validation files
Organization: Excellent ✅
Validation Files:
agent_validation.go- Agent configuration validationbundler_validation.go- Script bundler validationdocker_validation.go- Docker configuration validationengine_validation.go- Engine configuration validationexpression_validation.go- Expression syntax validationgithub_toolset_validation_error.go- GitHub toolset errorsmcp_config_validation.go- MCP configuration validationnpm_validation.go- NPM package validationpip_validation.go- Python package validationrepository_features_validation.go- Repository feature checksruntime_validation.go- Runtime environment validationschema_validation.go- YAML schema validationstep_order_validation.go- Step ordering validationstrict_mode_validation.go- Strict mode checkstemplate_validation.go- Template validationvalidation.go- General validation utilities
Analysis:
✅ Validation logic is properly separated by domain
✅ Each file has a clear, single purpose
✅ Naming convention is consistent
Recommendation: Continue this pattern for new validation logic.
Cluster 2: Creation Functions ✅
Pattern: create_* functions
Files: 7 dedicated creation files
Organization: Excellent ✅
Creation Files:
create_agent_task.go- Agent task creationcreate_code_scanning_alert.go- Security alert creationcreate_discussion.go- Discussion creationcreate_issue.go- Issue creationcreate_pr_review_comment.go- PR review comment creationcreate_pull_request.go- Pull request creationadd_comment.go- Comment additionadd_labels.go- Label addition
Analysis:
✅ Each creation function has its own file
✅ Files named after their primary purpose
✅ Consistent structure across files
Recommendation: Continue this pattern for new creation/modification operations.
Cluster 3: Helper Functions ✅
Pattern: *_helper* or *_helpers files
Files: 6 dedicated helper files
Organization: Good ✅
Helper Files:
config_helpers.go- Configuration parsing helpersengine_helpers.go- Engine setup and rendering helpersgh_helper.go- GitHub CLI interaction helpersmap_helpers.go- Map/dictionary utilitiesprompt_step_helper.go- Prompt step generation helperssafe_outputs_env_helpers.go- Safe outputs environment helpers
Analysis:
✅ Helper functions are consolidated into topic-specific files
✅ Each helper file serves a clear domain
✅ Better than scattering helpers across many files
Recommendation: Continue consolidating related helper functions.
Cluster 4: Engine Files ✅
Pattern: *_engine.go files
Files: 5 engine files
Organization: Excellent ✅
Engine Files:
agentic_engine.go- Base agentic engineclaude_engine.go- Claude-specific enginecodex_engine.go- Codex-specific enginecopilot_engine.go- Copilot-specific enginecustom_engine.go- Custom engine supportengine.go- Engine interface and utilities
Analysis:
✅ Each AI engine has its own dedicated file
✅ Common functionality extracted to engine.go
✅ Clear separation of concerns
Recommendation: Continue this pattern for any new engines.
Cluster 5: Extraction Functions
Pattern: extract* or Extract* functions
Files: Multiple files with extraction logic
Organization: Acceptable
Key Files:
frontmatter_extraction.go- Extracts frontmatter fields (20 extract methods)secret_extraction.go- Extracts secrets from valuespackage_extraction.go- Extracts package namesexpression_extraction.go- Extracts expressionsconfig_helpers.go- Extract config values
Analysis:
✅ Extraction functions grouped by domain (frontmatter, secrets, packages)
Recommendation:
- Current organization is acceptable
- Consider whether
config_helpers.goextraction functions could be more clearly separated
Cluster 6: Parsing Functions
Pattern: parse* or Parse* functions
Files: Spread across parser and workflow packages
Organization: Good ✅
Key Patterns:
pkg/parser/frontmatter.go- Parse frontmatter from markdownpkg/parser/github_urls.go- Parse GitHub URLspkg/workflow/compiler.go-ParseWorkflowFile(),parseOnSection()pkg/workflow/expressions.go- Expression parsingpkg/workflow/time_delta.go- Time delta parsing- Various
parse*Config()methods co-located with their features
Analysis:
✅ Parser package handles generic parsing
✅ Workflow-specific parsing is in workflow package
✅ Config parsing co-located with features
Recommendation: Continue this separation pattern.
Cluster 7: Formatting/Output Functions
Pattern: Format* or format* functions
Files: Primarily in pkg/console/
Organization: Good ✅ (with one duplicate)
Console Formatting Functions:
FormatError()- Format compiler errorsFormatErrorMessage()- Format error textFormatErrorWithSuggestions()- Format errors with suggestionsFormatSuccessMessage()- Format success textFormatInfoMessage()- Format info textFormatWarningMessage()- Format warning textFormatLocationMessage()- Format file locationsFormatCommandMessage()- Format commandsFormatProgressMessage()- Format progress textFormatPromptMessage()- Format promptsFormatCountMessage()- Format countsFormatVerboseMessage()- Format verbose textFormatListHeader()- Format list headersFormatListItem()- Format list itemsFormatFileSize()- Format file sizes⚠️ (duplicate)FormatNumber()- Format numbersFormatDuration()- Format time durations (inpkg/timeutil/)
Analysis:
✅ Most formatting is centralized in console package
formatFileSize in render.go)
✅ Consistent naming pattern
Recommendation:
- Fix the
formatFileSizeduplicate (see Issue rejig docs #1) - Continue centralizing formatting in console package
Cluster 8: Sanitization/Normalization Functions
Pattern: sanitize*, Sanitize*, normalize*
Files: Primarily in pkg/workflow/
Organization: Acceptable
Key Functions:
strings.go:SanitizeName(),SanitizeWorkflowName()workflow_name.go:SanitizeIdentifier()resolve.go:normalizeWorkflowName()safe_outputs.go:normalizeSafeOutputIdentifier()domain_sanitization.go:computeAllowedDomainsForSanitization()scripts.go:getSanitizeOutputScript()
Analysis:
✅ Each serves a specific domain purpose
Recommendation:
- Document the distinction between "sanitize" (character cleanup) vs "normalize" (format/extension handling)
- Consider consolidating string manipulation utilities
Refactoring Recommendations
Priority 1: High Impact (Quick Wins)
1.1 Remove Duplicate formatFileSize() Function
Action: Consolidate duplicate file size formatting
Files affected:
pkg/console/render.go(remove local duplicate)- Keep:
pkg/console/format.go(exported version)
Steps:
- Replace internal calls to
formatFileSize()in render.go withFormatFileSize() - Remove the duplicate
formatFileSize()function from render.go - Verify tests pass
Estimated effort: 30 minutes
Benefits:
- Remove 24 lines of duplicate code
- Single source of truth
- Easier to maintain and test
Priority 2: Medium Impact (Documentation)
2.1 Document String Processing Patterns
Action: Add package-level documentation clarifying naming conventions
Clarifications needed:
- Sanitize: Character cleanup for valid identifiers
- Normalize: Format/extension handling and standardization
- When to use each pattern
Files to document:
pkg/workflow/strings.go- Add package commentpkg/workflow/workflow_name.go- Add function commentspkg/workflow/resolve.go- Add function comments
Estimated effort: 1 hour
Benefits:
- Clearer intent for future developers
- Consistent usage patterns
- Reduced confusion
Priority 3: Long-term Improvements (Optional)
3.1 Consider Utility Package Consolidation
Action: Evaluate creating a dedicated utilities package
Potential candidates for consolidation:
- String processing (
SanitizeName, normalization functions) - Value extraction (
extractStringFromMap,parseIntValue) - Common validation helpers
Estimated effort: 4-6 hours
Benefits:
- Centralized common utilities
- Easier discovery
- Reduced import complexity
Note: This is optional and should only be done if the team sees value in it. The current organization is acceptable.
Strengths of Current Organization
✅ Excellent patterns identified:
- Validation files - Each validation concern has its own file
- Creation files - Each create operation has its own file
- Engine files - Each AI engine is properly separated
- Helper files - Related helpers are consolidated
- Consistent naming - Functions follow clear naming patterns
- Domain separation - Parser, workflow, console packages well-defined
Analysis Metadata
- Total Go Files Analyzed: 223 (non-test files)
- Total Test Files: ~200+ (not included in analysis)
- Packages Analyzed: 8 packages
- Function Clusters Identified: 8 major clusters
- Exact Duplicates Found: 1 (
formatFileSize) - Similar Functions Found: 4 (sanitization/normalization)
- Well-Organized Patterns: 4 (validation, creation, engines, helpers)
- Detection Method: Serena semantic code analysis + grep pattern analysis
- Analysis Date: 2025-11-17
- Repository: githubnext/gh-aw
- Codebase Language: Go
Implementation Checklist
Immediate Actions
- Review findings and prioritize refactoring tasks
- Fix
formatFileSize()duplicate (Priority 1) - Add documentation for string processing patterns (Priority 2)
Future Considerations
- Monitor for new duplicates in code reviews
- Consider utility package consolidation (Priority 3)
- Maintain validation and creation file patterns
- Continue consistent naming conventions
Conclusion
The codebase demonstrates strong organizational patterns with well-structured validation, creation, and engine files. The primary issue identified is a single exact duplicate function (formatFileSize) which can be easily resolved. The semantic function clustering revealed that the team follows Go best practices with feature-based file organization.
Overall Assessment: ✅ Well-organized codebase with minimal refactoring needed.
Summary
This analysis identified:
- ✅ Strong organization: Validation, creation, and engine files follow excellent patterns
- 🔴 1 exact duplicate:
formatFileSize()should be consolidated ⚠️ Minor improvements: Documentation for string processing patterns- 📊 8 function clusters: All appropriately organized
Recommendation: Focus on the Priority 1 duplicate removal, then document patterns as time permits.
AI generated by Semantic Function Refactoring