-
Notifications
You must be signed in to change notification settings - Fork 32
Description
🔧 Semantic Function Clustering Analysis
Repository analyzed: githubnext/gh-aw
Analysis date: 2025-11-12
Total non-test Go files analyzed: 206
Total functions cataloged: 1,269
Total lines of code: ~186,000
Executive Summary
This analysis examined the codebase structure to identify opportunities for improved code organization through semantic function clustering. The repository follows strong naming conventions and feature-based organization patterns, particularly in the pkg/cli and pkg/workflow packages.
Key findings:
- ✅ Strong file organization with clear naming patterns (
*_command.go,mcp_*,create_*,*_validation.go) ⚠️ 5 high-priority outliers identified (functions in wrong files)⚠️ 3 significant duplicate patterns detected (similar logic across files)⚠️ Validation logic concentration issue inpkg/workflow/validation.go- ✅ Minimal problematic duplication overall (most duplication is acceptable engine-specific customization)
Detailed Analysis Report
Package Organization Overview
pkg/cli (69 files)
Organization patterns:
- Command pattern (
*_command.go): 10 files withNew*Command()entry points - Feature prefixes (
mcp_*,logs_*): 16 MCP files, 5 logs files - Domain files: GitHub (
github.go,git.go,repo.go), Actions (actions.go,workflows.go) - Core infrastructure:
commands.go,init.go,resolver.go
Strengths:
- Clear command organization with consistent structure
- Feature clustering (MCP files together, logs files together)
- Well-defined entry points for CLI commands
Issues identified:
logs.gois overloaded (35+ symbols, mixed concerns)- Template strings stored in
commands.goinstead of dedicated file - GitHub API operations scattered across multiple files
pkg/workflow (123 files)
Organization patterns:
- Operation-based (
create_*.go): 6 creation files with consistent structure - Validation suite (
*_validation.go): 13 validation files - Engine architecture (
*_engine.go): 10 engine-related files - Compiler core (
compiler*.go): 3 compiler files - Package managers: 6 package manager files with paired validation
- Prompt generation (
*_prompt.go): 6 specialized prompt files - MCP configuration: 3 MCP-related files
Strengths:
- Excellent naming consistency
- Clear separation of concerns (creation, validation, compilation)
- Engine infrastructure properly separated from implementations
- Package managers follow consistent pairing pattern
Issues identified:
validation.gois catch-all with 33+ functions (should be split)safe_outputs_env_test_helpers.gomisnaming (not a test file)config.gois empty placeholder (4 lines)frontmatter_extraction.govery large (24 methods, could be split)
Other packages (5 files)
- pkg/console: 4 files (console, render, format, spinner) - well organized
- pkg/constants: 1 file with all constants - appropriate
- pkg/logger: 1 file - simple logger implementation
- pkg/parser: 6 files (frontmatter, github, mcp, schema, yaml_error, json_path_locator) - well structured
- pkg/timeutil: 1 file - utility functions
Function Clustering Results
Cluster 1: Creation Functions (CRUD Operations)
Pattern: create_* functions for GitHub operations
Files: 6 files in pkg/workflow/
create_issue.go → CreateIssueConfig, parseCreateIssueConfig, buildCreateIssueJob
create_pull_request.go → CreatePullRequestConfig, parse*, build*
create_discussion.go → CreateDiscussionConfig, parse*, build*
create_agent_task.go → CreateAgentTaskConfig, parse*, build*
create_pr_review_comment.go → CreatePRReviewCommentConfig, parse*, build*
create_code_scanning_alert.go → CreateCodeScanningAlertConfig, parse*, build*
Assessment: ✅ Well-organized - Each operation has its own file with consistent structure
Cluster 2: Validation Functions
Pattern: validate* and check* functions
Distribution: Across 13+ files
Primary locations:
pkg/workflow/validation.go(33+ functions)⚠️ OVERLOADED- Specialized validators (properly split):
bundler_validation.go(1 function)docker_validation.go(1 function)npm_validation.go(1 function)pip_validation.go(4 functions)template_validation.go(1 function)expression_validation.go(2 functions)step_order_validation.go(full tracker type)strict_mode_validation.go(4 functions)mcp_config_validation.go(6 functions)engine_validation.go(2 functions)permissions_validator.go(13 functions)
Also in CLI:
pkg/cli/mcp_validation.go(2 functions)pkg/cli/run_command.go(validateRemoteWorkflow)pkg/cli/add_command.go(workflow validation)
Issue: validation.go contains unrelated validations:
- Expression sizes
- Container images
- Runtime packages
- GitHub Actions schema
- Secret references
- Repository features (6 helper functions)
- HTTP transport support
- Max turns support
- Web search support
- Agent file validation
Cluster 3: Engine System
Pattern: *Engine implementations and infrastructure
Files: 10 files in pkg/workflow/
Core infrastructure:
├── engine.go (base types, registry)
├── agentic_engine.go (BaseEngine, interfaces)
├── engine_helpers.go (15 shared utilities)
├── engine_validation.go (validation)
├── engine_output.go (output collection)
├── engine_firewall_support.go (firewall)
└── engine_network_hooks.go (network hooks)
Implementations:
├── claude_engine.go + claude_mcp.go + claude_settings.go + claude_tools.go + claude_logs.go
├── copilot_engine.go
├── codex_engine.go
└── custom_engine.go
Assessment: ✅ Well-organized - Clear separation between infrastructure and implementations
Cluster 4: Package Extraction Functions
Pattern: extract*FromCommands functions
Significant similarity detected
npm.go:
func extractNpxFromCommands(commands string) []string {
var packages []string
lines := strings.Split(commands, "\n")
for _, line := range lines {
words := strings.Fields(line)
for i, word := range words {
if word == "npx" && i+1 < len(words) {
// Skip flags and find first package
for j := i + 1; j < len(words); j++ {
pkg := words[j]
pkg = strings.TrimRight(pkg, "&|;")
if !strings.HasPrefix(pkg, "-") {
packages = append(packages, pkg)
break
}
}
}
}
}
return packages
}pip.go:
func extractPipFromCommands(commands string) []string {
var packages []string
lines := strings.Split(commands, "\n")
for _, line := range lines {
words := strings.Fields(line)
for i, word := range words {
if (word == "pip" || word == "pip3") && i+1 < len(words) {
for j := i + 1; j < len(words); j++ {
if words[j] == "install" {
// Same flag-skipping logic...
}
}
}
}
}
return packages
}Similarity: ~75% - Same structure, flag-skipping logic, and string processing
Also similar: extractUvFromCommands, extractGoFromCommands
Cluster 5: Parsing Functions
Pattern: parse* functions
Locations: Across multiple packages
In pkg/parser:
ParseImportDirective(frontmatter.go)ParseMCPConfig(mcp.go)parseJSONPath(json_path_locator.go)
In pkg/workflow:
parseTimeDeltafamily (time_delta.go): 5 related functionsparse*Toolfunctions (tools_types.go): 12 tool-specific parsersparse*Package(dependabot.go): 3 package parsers
In pkg/cli:
parseRepoSpec,parseGitHubURL,parseWorkflowSpec,parseLocalWorkflowSpec,parseSourceSpec(spec.go)parsePRURL(pr_command.go)parseIssueSpec(trial_command.go)parseVersion(semver.go)- Multiple log parsing functions (logs_parsing.go, firewall_log.go, access_log.go)
Assessment: Generally well-organized, each parser handles specific domain
Cluster 6: Extraction Functions
Pattern: extract* functions
High concentration - 50+ extraction functions
Common patterns:
- From frontmatter:
extractToolsFromFrontmatter,extractMCPServersFromFrontmatter,extractRuntimesFromFrontmatter - From content:
extractToolsFromContent,extractStepsFromContent,extractEngineFromContent - From logs:
extractLogMetrics,extractMissingToolsFromRun,extractMCPFailuresFromRun - From strings:
extractSecretName,extractRepoSlug,extractDomainFromURL - From configs:
extractCustomArgs,extractSecretsFromValue,extractSecretsFromHeaders
Assessment: Appropriate distribution, each extraction serves specific purpose
Cluster 7: Rendering/Generation Functions
Pattern: render*, generate*, build* functions
Locations: Primarily in pkg/workflow compiler and MCP config
In pkg/workflow/mcp-config.go:
renderPlaywrightMCPConfig(+ variants)renderSafeOutputsMCPConfig(+ variants)renderAgenticWorkflowsMCPConfig(+ variants)renderCustomMCPConfigWrapperrenderBuiltinMCPServerBlock
In pkg/workflow/compiler_yaml.go:
- Multiple YAML generation methods
In pkg/workflow (various):
generateCacheSteps,generateCacheMemoryStepsgenerateSetupStep,generateCleanupStepbuildArtifactDownloadSteps,buildCopilotParticipantStepsbuildConditionTree,buildOr,buildAnd
In pkg/console:
renderValue,renderStruct,renderSlice,renderMaprenderContext,renderTableRow
Assessment: Well-organized by domain (MCP config, compiler YAML, console output)
Identified Issues
1. Outlier Functions (High Priority)
Issue #1: Setup Functions in Wrong File
**(redacted) pkg/cli/add_command.go
Problem: Contains multiple setup functions unrelated to adding workflows
Outlier functions:
func ensureCopilotInstructions(...) // Line 819
func ensureAgenticWorkflowPrompt(...) // Line 869
func ensureAgenticWorkflowAgent(...) // Line 897
func ensureSharedAgenticWorkflowAgent(...) // Line 902
func ensureSetupAgenticWorkflowsAgent(...) // Line 907Recommendation: Move to copilot_setup.go or new agent_setup.go file
Impact: Improved file cohesion, clearer separation of concerns
Issue #2: Git/PR Operations in Command File
**(redacted) pkg/cli/add_command.go
Problem: Contains Git and PR operations that belong elsewhere
Outlier functions:
func checkCleanWorkingDirectory(...) // Line 912 → Should be in git.go
func createPR(...) // Line 934 → Should be in pr_command.goRecommendation: Move to appropriate domain files
Impact: Better organization, reusability across commands
Issue #3: Compilation Logic in Add Command
**(redacted) pkg/cli/add_command.go
Problem: Contains compilation logic that overlaps with compile_command.go
Outlier functions:
func compileWorkflow(...) // Should use compile_command.go
func compileWorkflowWithTracking(...) // Duplicates compilation logicRecommendation: Refactor to use shared compilation utilities
Impact: Reduced duplication, single source of truth for compilation
Issue #4: GitHub API Operations Scattered
Problem: GitHub API calls spread across multiple files
Locations:
pkg/cli/logs.go:fetchJobStatuses(),fetchJobDetails()pkg/cli/github.go:getGitHubHost()pkg/cli/actions.go:convertToGitHubActionsEnv()pkg/cli/workflows.go:fetchGitHubWorkflows()
Recommendation: Consolidate into dedicated GitHub API client or enhance existing github.go
Impact: Centralized API access, easier maintenance, consistent error handling
Issue #5: Test Helpers File Misnaming
**(redacted) pkg/workflow/safe_outputs_env_test_helpers.go
Problem: Named like test file but NOT a test file (doesn't end with _test.go)
Recommendation: Rename to safe_outputs_test_helpers.go or safe_outputs_env_helpers.go
Impact: Correct naming convention, clarity about file purpose
2. Duplicate or Near-Duplicate Functions
Duplicate #1: Package Extraction Pattern (High Priority)
Similarity: ~75% code similarity
Pattern: Command-line package extraction across different package managers
Files affected:
pkg/workflow/npm.go:extractNpxFromCommandspkg/workflow/pip.go:extractPipFromCommands,extractUvFromCommandspkg/workflow/dependabot.go:extractGoFromCommands
Common logic:
- Split commands by newlines
- Split each line into words
- Find package manager command
- Skip flags (starting with
-) - Extract package names
- Trim trailing shell operators (
&|;)
Code comparison:
// npm.go - extractNpxFromCommands
var packages []string
lines := strings.Split(commands, "\n")
for _, line := range lines {
words := strings.Fields(line)
for i, word := range words {
if word == "npx" && i+1 < len(words) {
for j := i + 1; j < len(words); j++ {
pkg := words[j]
pkg = strings.TrimRight(pkg, "&|;")
if !strings.HasPrefix(pkg, "-") {
packages = append(packages, pkg)
break
}
}
}
}
}
// pip.go - extractPipFromCommands
var packages []string
lines := strings.Split(commands, "\n")
for _, line := range lines {
words := strings.Fields(line)
for i, word := range words {
if (word == "pip" || word == "pip3") && i+1 < len(words) {
for j := i + 1; j < len(words); j++ {
if words[j] == "install" {
for k := j + 1; k < len(words); k++ {
pkg := words[k]
pkg = strings.TrimRight(pkg, "&|;")
if !strings.HasPrefix(pkg, "-") {
packages = append(packages, pkg)
break
}
}
break
}
}
}
}
}Recommendation:
Create pkg/workflow/package_extraction.go with generic extraction framework:
type PackageExtractor struct {
CommandNames []string // e.g., ["pip", "pip3"]
RequiredSubcommand string // e.g., "install" (optional)
TrimSuffixes string // e.g., "&|;"
}
func (pe *PackageExtractor) ExtractPackages(commands string) []string {
// Generic implementation
}
// Usage in npm.go:
var npxExtractor = PackageExtractor{
CommandNames: []string{"npx"},
TrimSuffixes: "&|;",
}
func extractNpxFromCommands(commands string) []string {
return npxExtractor.ExtractPackages(commands)
}Estimated effort: 3-4 hours
Benefits:
- Reduced code duplication (~150 lines → ~50 lines)
- Single source of truth for extraction logic
- Easier to fix bugs and add features
- Consistent behavior across package managers
Duplicate #2: Secret Extraction Functions
Similarity: ~60% code similarity
Pattern: Extracting secrets from various sources
Files affected:
pkg/workflow/mcp-config.go:extractSecretsFromValue,extractSecretsFromHeaderspkg/cli/secrets.go:extractSecretsFromConfig
Common logic:
- Pattern matching for
${{ secrets.NAME }} - Map building for secret names
- Similar regex/string parsing approaches
Recommendation: Consolidate into pkg/workflow/secret_extraction.go with shared utilities
Estimated effort: 2-3 hours
Benefits: Centralized secret detection logic, easier maintenance
Duplicate #3: Log Parsing Functions
Similarity: ~50-60% similarity
Pattern: Line-by-line log parsing with similar structure
Files affected:
pkg/cli/firewall_log.go:parseFirewallLogLine,parseFirewallLogpkg/cli/access_log.go:parseSquidLogLine,parseSquidAccessLogpkg/cli/logs_parsing.go:parseLogFileWithEngine,parseAgentLog
Common patterns:
- Open file
- Read line by line
- Parse line with regex or field splitting
- Accumulate results
- Error handling
Recommendation: Consider shared log parsing utilities in pkg/cli/log_parser.go
Estimated effort: 4-5 hours
Benefits: Reduced duplication, consistent error handling, reusable parsing framework
3. Validation Logic Concentration Issue
**(redacted) pkg/workflow/validation.go
Problem: Catch-all file with 33+ unrelated validation functions (450+ lines)
Current contents (mixed concerns):
- Expression validation (
validateExpressionSizes) - Container validation (
validateContainerImages) - Runtime validation (
validateRuntimePackages) - Schema validation (
validateGitHubActionsSchema) - Secret validation (
validateSecretReferences) - Repository features (6 functions:
validateRepositoryFeatures,checkRepositoryHasDiscussions*,checkRepositoryHasIssues*) - Agent validation (
validateAgentFile,validateMaxTurnsSupport,validateWebSearchSupport) - HTTP transport validation
Recommendation: Split into focused files:
validation.go (keep only high-level orchestration)
├── repository_features_validation.go (repository feature checking)
├── schema_validation.go (GitHub Actions schema)
├── runtime_validation.go (packages, containers, expressions)
└── agent_validation.go (agent file, feature support)
Estimated effort: 3-4 hours
Benefits:
- Clearer separation of concerns
- Easier to find and maintain validation logic
- Follows existing pattern of specialized validators
- Better testability
4. Scattered Helper Functions
Issue: Helper functions distributed but could benefit from consolidation
Current distribution:
engine_helpers.go: 15 functions ✅ Goodconfig_helpers.go: 4 functions ✅ Goodfrontmatter_helpers.go: 2 functions ✅ Goodprompt_step_helper.go: 1 function⚠️ Could be consolidated
Recommendation:
- Consider
compiler_helpers.gofor internal Compiler helpers currently embedded incompiler.go - Potentially consolidate single-function helper files
Priority: Low (current organization is acceptable)
5. Empty Placeholder File
**(redacted) pkg/workflow/config.go
Content: 4 lines (just a comment saying content moved to config_helpers.go)
Recommendation: Remove file or repurpose for actual config types
Estimated effort: 5 minutes
Impact: Cleaner codebase
Refactoring Recommendations
Priority 1: High Impact (Recommended)
1. Split validation.go
Goal: Break up overloaded validation file into focused modules
Tasks:
- Create
repository_features_validation.go(6 functions) - Create
schema_validation.go(schema validation) - Create
runtime_validation.go(packages, containers, expressions) - Create
agent_validation.go(agent features) - Keep orchestration in
validation.go
Estimated effort: 3-4 hours
Benefits:
- ✅ Improved code organization
- ✅ Easier to find specific validators
- ✅ Better testability
- ✅ Follows existing specialized validator pattern
2. Create Package Extraction Framework
Goal: Eliminate duplication in package extraction logic
Tasks:
- Create
pkg/workflow/package_extraction.go - Implement generic
PackageExtractortype - Refactor npm.go, pip.go, dependabot.go to use framework
- Update tests
Estimated effort: 3-4 hours
Benefits:
- ✅ ~150 lines of duplicated code → ~50 lines
- ✅ Single source of truth
- ✅ Easier to add new package managers
- ✅ Consistent bug fixes across all extractors
3. Move Outlier Functions to Correct Files
Goal: Improve file cohesion by relocating misplaced functions
Tasks:
- Move setup functions from
add_command.goto appropriate setup files - Move
checkCleanWorkingDirectorytogit.go - Move
createPRtopr_command.goor extract shared PR utilities - Refactor compilation logic to use shared utilities
Estimated effort: 2-3 hours
Benefits:
- ✅ Better separation of concerns
- ✅ Improved code reusability
- ✅ Clearer file purposes
4. Fix Naming Issues
Goal: Correct file naming inconsistencies
Tasks:
- Rename
safe_outputs_env_test_helpers.gotosafe_outputs_env_helpers.go - Remove or repurpose empty
config.go
Estimated effort: 15 minutes
Benefits:
- ✅ Correct naming conventions
- ✅ Cleaner codebase
Priority 2: Medium Impact (Consider)
5. Consolidate GitHub API Operations
Goal: Centralize GitHub API interactions
Tasks:
- Audit all GitHub API calls across CLI package
- Create or enhance GitHub client abstraction
- Move scattered API operations to centralized location
- Add consistent error handling and retry logic
Estimated effort: 4-5 hours
Benefits:
- ✅ Centralized API access
- ✅ Consistent error handling
- ✅ Easier to add caching/rate limiting
- ✅ Better testability
6. Consolidate Secret Extraction
Goal: Unify secret detection logic
Tasks:
- Create
pkg/workflow/secret_extraction.go - Extract common secret pattern matching
- Refactor existing extraction functions to use shared utilities
Estimated effort: 2-3 hours
Benefits:
- ✅ Consistent secret detection
- ✅ Single place to update patterns
- ✅ Reduced duplication
Priority 3: Long-term Improvements (Optional)
7. Extract Template Strings
Goal: Move template strings from code to dedicated location
Tasks:
- Create
templates.goor move totemplates/directory - Extract templates from
commands.go - Update references
Estimated effort: 2-3 hours
Benefits:
- ✅ Easier template maintenance
- ✅ Better separation of code and content
8. Consider Log Parsing Framework
Goal: Create reusable log parsing utilities
Tasks:
- Identify common log parsing patterns
- Create
pkg/cli/log_parser.gowith generic utilities - Refactor firewall_log.go, access_log.go, logs_parsing.go
Estimated effort: 5-6 hours
Benefits:
- ✅ Consistent log parsing
- ✅ Reusable utilities
- ✅ Reduced duplication
9. Split Large Frontmatter Extraction File
Goal: Break up frontmatter_extraction.go (24 methods)
Consideration:
- File contains 24 Compiler methods for frontmatter extraction
- Could split by extraction domain:
frontmatter_tools_extraction.go(tools, MCP, runtimes)frontmatter_config_extraction.go(permissions, if, features)frontmatter_security_extraction.go(firewall, network)
Estimated effort: 4-5 hours
Priority: Low (current organization functional but could be improved)
Implementation Checklist
Phase 1: Quick Wins (1-2 days)
- Fix file naming: Rename
safe_outputs_env_test_helpers.go - Remove empty
config.goplaceholder - Move
checkCleanWorkingDirectorytogit.go - Move
createPRfunction to appropriate location
Phase 2: High-Impact Refactoring (3-5 days)
- Split
validation.gointo focused files- Create
repository_features_validation.go - Create
schema_validation.go - Create
runtime_validation.go - Create
agent_validation.go - Update imports and tests
- Create
- Create package extraction framework
- Design
PackageExtractortype - Implement generic extraction logic
- Refactor npm.go to use framework
- Refactor pip.go to use framework
- Refactor dependabot.go to use framework
- Update tests
- Design
- Move setup functions from
add_command.go- Identify appropriate destination files
- Move functions with proper documentation
- Update references
- Verify tests pass
Phase 3: Medium-Impact Improvements (5-7 days)
- Consolidate GitHub API operations
- Audit API calls across codebase
- Design GitHub client abstraction
- Implement centralized client
- Migrate existing calls
- Add error handling and retry logic
- Consolidate secret extraction
- Create
secret_extraction.go - Extract shared utilities
- Refactor existing functions
- Update tests
- Create
Phase 4: Long-term Considerations (As needed)
- Extract template strings to dedicated location
- Create log parsing framework
- Consider splitting large frontmatter extraction file
- Review and consolidate prompt generation files
Analysis Metadata
Analysis method: Serena semantic code analysis + naming pattern analysis + manual code inspection
Files analyzed: 206 non-test Go files
Functions cataloged: 1,269 functions
Lines of code: ~186,000
Packages analyzed:
pkg/cli: 69 filespkg/workflow: 123 filespkg/console: 4 filespkg/constants: 1 filepkg/logger: 1 filepkg/parser: 6 filespkg/timeutil: 1 file
Detection methods:
- Semantic symbol analysis using Serena MCP server
- Regex pattern matching for function naming patterns
- Manual code inspection of similar functions
- Symbol overview analysis for file organization assessment
Code similarity assessment:
- Package extraction functions: 75% similarity
- Secret extraction functions: 60% similarity
- Log parsing functions: 50-60% similarity
Conclusion
The gh-aw codebase demonstrates strong organizational principles with clear naming conventions and feature-based file clustering. The analysis identified 5 high-priority outliers, 3 significant duplicate patterns, and several opportunities for improved code organization.
Overall Assessment: 8/10
Strengths:
- ✅ Excellent naming conventions (
create_*,*_validation,*_engine,mcp_*) - ✅ Consistent file patterns and clear separation of concerns
- ✅ Well-organized engine architecture
- ✅ Minimal problematic duplication (most is acceptable customization)
- ✅ Clear feature clustering (MCP files, logs files, validation files)
Areas for Improvement:
⚠️ validation.go is overloaded with mixed concerns⚠️ Package extraction logic duplicated across 3-4 files⚠️ Some functions in wrong files (setup in add_command.go)⚠️ Minor naming inconsistencies
Recommended Next Steps:
- Address Priority 1 issues (high-impact, low-effort)
- Implement package extraction framework (high-value refactoring)
- Split validation.go into focused modules
- Consider Priority 2 improvements based on development velocity
The proposed refactorings maintain the codebase's strong organizational foundation while addressing specific pain points and duplication patterns. All recommendations preserve existing functionality and aim to improve maintainability, testability, and code reuse.
Labels: refactoring, code-quality, technical-debt, good-first-issue
Priority: Medium
Estimated Total Effort: 15-20 hours for Priority 1 + Priority 2 items
AI generated by Semantic Function Refactoring