Skip to content

feat: Implement SemanticNamingAssessor #82

@jeremyeder

Description

@jeremyeder

feat: Implement SemanticNamingAssessor

Attribute Definition

Attribute ID: semantic_naming (Attribute #21 - Tier 3)

Definition: Systematic naming patterns for variables, functions, classes, files following language/framework conventions.

Why It Matters: Research shows identifier style affects recall and precision. Consistency reduces cognitive load. AI models recognize naming patterns from training on open-source code.

Impact on Agent Behavior:

  • Accurate intent inference
  • Appropriate name suggestions
  • Code structure understanding
  • Pattern recognition

Measurable Criteria:

  • Follow language conventions:
    • Python: PEP 8 (snake_case functions, PascalCase classes, UPPER_CASE constants)
    • JavaScript/TypeScript: camelCase functions/variables, PascalCase classes
    • Go: mixedCaps (exported: UpperCase, unexported: lowerCase)
    • Java: camelCase methods, PascalCase classes, UPPER_CASE constants
  • Use paired opposites consistently: add/remove, start/stop, begin/end, open/close
  • Avoid abbreviations unless widely understood (HTTP, API, URL, ID)
  • Enforce via linters: pylint, eslint, golint

Implementation Requirements

File Location: src/agentready/assessors/code_quality.py

Class Name: SemanticNamingAssessor

Tier: 3 (Important)

Default Weight: 0.015 (1.5% of total score)

Assessment Logic

Scoring Approach: Heuristic analysis of naming patterns in codebase

Evidence to Check (score components):

  1. Language convention compliance (50%)

    • Python: Check for snake_case functions, PascalCase classes
    • JavaScript: Check for camelCase functions, PascalCase classes
    • Use AST parsing to extract identifiers
  2. Avoid common anti-patterns (30%)

    • Single-letter variables (except i, j, k in loops)
    • Generic names: temp, data, info, obj, var
    • Abbreviations: usr, mgr, svc, repo (unless ubiquitous)
    • Inconsistent naming: mixedStyles in same file
  3. Semantic clarity (20%)

    • Names >3 characters (descriptive)
    • Verbs for functions (calculate, fetch, create)
    • Nouns for classes (User, Order, Service)

Scoring Logic:

convention_score = check_naming_convention(identifiers)
antipattern_score = detect_naming_antipatterns(identifiers)
clarity_score = assess_name_semantics(identifiers)

total_score = (convention_score * 0.5) + (antipattern_score * 0.3) + (clarity_score * 0.2)

status = "pass" if total_score >= 75 else "fail"

Code Pattern to Follow

Reference: TypeAnnotationsAssessor for AST-based code analysis

Pattern:

  1. Check is_applicable() for supported languages
  2. Use AST to extract function/class/variable names
  3. Validate naming patterns against language conventions
  4. Detect anti-patterns (single letters, abbreviations, generic names)
  5. Calculate proportional score

Example Finding Responses

Pass (Score: 92)

Finding(
    attribute=self.attribute,
    status="pass",
    score=92.0,
    measured_value="consistent naming",
    threshold="language conventions followed",
    evidence=[
        "98% of identifiers follow Python PEP 8 conventions",
        "snake_case used for 145/148 functions",
        "PascalCase used for all 23 classes",
        "Only 2 single-letter variables outside loops",
        "No generic names (temp, data, obj) detected",
    ],
    remediation=None,
    error_message=None,
)

Fail (Score: 54)

Finding(
    attribute=self.attribute,
    status="fail",
    score=54.0,
    measured_value="inconsistent naming",
    threshold="language conventions followed",
    evidence=[
        "Mixed naming styles: camelCase and snake_case functions",
        "15 single-letter variables found (not in loops)",
        "Generic names detected: temp, data, obj, var (18 occurrences)",
        "Abbreviations used inconsistently: usr, mgr, svc",
        "Some functions lack verb prefixes (e.g., 'user' instead of 'get_user')",
    ],
    remediation=self._create_remediation(),
    error_message=None,
)

Not Applicable

Finding.not_applicable(
    self.attribute,
    reason="No code files found to analyze naming patterns"
)

Registration

Add to src/agentready/services/scanner.py in create_all_assessors():

from ..assessors.code_quality import (
    TypeAnnotationsAssessor,
    CyclomaticComplexityAssessor,
    StructuredLoggingAssessor,
    SemanticNamingAssessor,  # Add this import
)

def create_all_assessors() -> List[BaseAssessor]:
    return [
        # ... existing assessors ...
        SemanticNamingAssessor(),  # Add this line
    ]

Testing Guidance

Test File: tests/unit/test_assessors_code_quality.py

Test Cases to Add:

  1. test_semantic_naming_pass_python: Python code with PEP 8 naming
  2. test_semantic_naming_fail_mixed_styles: Code with inconsistent naming
  3. test_semantic_naming_fail_generic_names: Code with temp, data, obj
  4. test_semantic_naming_partial_score: Some compliance, some violations
  5. test_semantic_naming_not_applicable: Non-code repository

Note: AgentReady follows PEP 8 conventions, should score well (90+).

Dependencies

External Tools: None (AST parsing only)

Python Standard Library:

  • ast for parsing Python code
  • re for pattern matching identifier names

Optional Enhancement: Use pylint --enable=invalid-name for detailed analysis

Remediation Steps

def _create_remediation(self) -> Remediation:
    return Remediation(
        summary="Improve naming consistency and semantic clarity",
        steps=[
            "Follow language naming conventions (PEP 8, Google Style Guides)",
            "Use descriptive names (>3 characters, no abbreviations)",
            "Apply consistent case: snake_case for functions, PascalCase for classes",
            "Use verbs for functions: get_user, calculate_total, create_order",
            "Use nouns for classes: User, OrderService, PaymentProcessor",
            "Avoid generic names: temp, data, obj, var, info",
            "Enforce with linters: pylint, eslint, golint",
        ],
        tools=["pylint", "eslint", "golint"],
        commands=[
            "# Python - Check naming conventions",
            "pylint --disable=all --enable=invalid-name src/",
            "",
            "# JavaScript - Check naming with ESLint",
            "npx eslint --rule 'camelcase: error' src/",
        ],
        examples=[
            """# Python - Good naming
class UserService:
    MAX_LOGIN_ATTEMPTS = 5

    def create_user(self, email: str) -> User:
        '''Create new user.'''
        pass

    def delete_user(self, user_id: str) -> None:
        '''Delete existing user.'''
        pass

# Python - Bad naming
class userservice:  # Should be PascalCase
    maxLoginAttempts = 5  # Should be UPPER_CASE

    def CreateUser(self, e: str) -> User:  # Should be snake_case
        pass

    def removeUser(self, uid: str) -> None:  # Inconsistent (delete vs remove)
        pass
""",
            """// JavaScript - Good naming
class UserService {
    static MAX_LOGIN_ATTEMPTS = 5;

    createUser(email) {
        // ...
    }

    deleteUser(userId) {
        // ...
    }
}

// JavaScript - Bad naming
class user_service {  // Should be PascalCase
    static max_login_attempts = 5;  // Should be UPPER_CASE

    CreateUser(e) {  // Should be camelCase
        // ...
    }

    remove_user(uid) {  // Inconsistent naming
        // ...
    }
}
""",
        ],
        citations=[
            Citation(
                source="Python.org",
                title="PEP 8 - Style Guide for Python Code",
                url="https://peps.python.org/pep-0008/#naming-conventions",
                relevance="Official Python naming conventions",
            ),
            Citation(
                source="Google",
                title="Google Style Guides",
                url="https://google.github.io/styleguide/",
                relevance="Industry-standard style guides for multiple languages",
            ),
        ],
    )

Implementation Notes

  1. AST Parsing: Extract identifiers from function definitions, class definitions, variable assignments
  2. Pattern Detection:
    • Python: r'^[a-z_][a-z0-9_]*$' for functions, r'^[A-Z][a-zA-Z0-9]*$' for classes
    • JavaScript: r'^[a-z][a-zA-Z0-9]*$' for functions, r'^[A-Z][a-zA-Z0-9]*$' for classes
  3. Anti-Pattern Detection: Regex for single letters, temp/data/obj, abbreviations
  4. Sampling: Analyze sample of files (not all) for large repositories
  5. Scoring: Proportional score based on percentage of compliant identifiers
  6. Edge Cases: Loop variables (i, j, k) are acceptable single-letter names

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions