Skip to content

feat: Implement CodeSmellsAssessor #87

@jeremyeder

Description

@jeremyeder

feat: Implement CodeSmellsAssessor

Attribute Definition

Attribute ID: code_smells (Attribute #29 - Tier 4)

Definition: Removing indicators of deeper problems: long methods, large classes, duplicate code, dead code, magic numbers.

Why It Matters: Research shows AI-generated code increases "code churn" (copy/paste vs. refactoring) and DRY principle violations. Clean baseline prevents AI from perpetuating anti-patterns.

Impact on Agent Behavior:

  • Better intent understanding
  • More accurate refactoring suggestions
  • Avoidance of anti-pattern propagation
  • Improved code quality over time

Measurable Criteria:

  • Tools: SonarQube, PMD, Checkstyle, pylint, eslint
  • Zero critical smells
  • <5 major smells per 1000 lines of code
  • Common smells monitored:
    • Duplicate code (DRY violations)
    • Long methods (>50 lines)
    • Large classes (>500 lines)
    • Long parameter lists (>5 params)
    • Divergent change (one class changing for multiple reasons)

Implementation Requirements

File Location: src/agentready/assessors/code_quality.py

Class Name: CodeSmellsAssessor

Tier: 4 (Advanced)

Default Weight: 0.005 (0.5% of total score)

Assessment Logic

Scoring Approach: Heuristic detection of common code smells

Evidence to Check (score components):

  1. Long methods (30%)

    • Count functions/methods >50 lines
    • Threshold: <10% of functions exceed limit
  2. Large classes (20%)

    • Count classes >500 lines
    • Threshold: <5% of classes exceed limit
  3. Long parameter lists (20%)

    • Count functions with >5 parameters
    • Threshold: <10% of functions exceed limit
  4. Magic numbers (15%)

    • Detect numeric literals in code (not in constants)
    • Sample files for magic number prevalence
  5. Duplicate code (15%)

    • Heuristic: count identical code blocks (advanced: use AST)
    • Or use external tool if available (jscpd, pylint)

Scoring Logic:

long_methods_score = 100 - (long_methods_percent * 2)  # Penalize
large_classes_score = 100 - (large_classes_percent * 2)
long_params_score = 100 - (long_params_percent * 2)
magic_numbers_score = detect_magic_numbers()  # Heuristic
duplicate_code_score = detect_duplicates()  # Heuristic

total_score = (
    long_methods_score * 0.3 +
    large_classes_score * 0.2 +
    long_params_score * 0.2 +
    magic_numbers_score * 0.15 +
    duplicate_code_score * 0.15
)

status = "pass" if total_score >= 75 else "fail"

Code Pattern to Follow

Reference: CyclomaticComplexityAssessor for code analysis pattern

Pattern:

  1. Check is_applicable() for supported languages
  2. Parse code with AST (Python) or regex heuristics
  3. Count various code smell indicators
  4. Calculate proportional score for each smell type
  5. Combine scores with weighting
  6. Provide detailed remediation

Example Finding Responses

Pass (Score: 88)

Finding(
    attribute=self.attribute,
    status="pass",
    score=88.0,
    measured_value="minimal code smells",
    threshold="<5 major smells per 1000 LOC",
    evidence=[
        "Long methods: 3/145 functions exceed 50 lines (2.1%)",
        "Large classes: 0/23 classes exceed 500 lines",
        "Long parameter lists: 5/145 functions have >5 params (3.4%)",
        "Magic numbers: minimal usage, most in constants",
        "Code smell density: 2.3 per 1000 LOC (excellent)",
    ],
    remediation=None,
    error_message=None,
)

Fail (Score: 42)

Finding(
    attribute=self.attribute,
    status="fail",
    score=42.0,
    measured_value="numerous code smells",
    threshold="<5 major smells per 1000 LOC",
    evidence=[
        "Long methods: 25/145 functions exceed 50 lines (17.2%)",
        "Large classes: 3/23 classes exceed 500 lines (13.0%)",
        "Long parameter lists: 18/145 functions have >5 params (12.4%)",
        "Magic numbers: frequent usage (45 instances)",
        "Code smell density: 18.7 per 1000 LOC (poor)",
    ],
    remediation=self._create_remediation(),
    error_message=None,
)

Not Applicable

Finding.not_applicable(
    self.attribute,
    reason="Insufficient code to analyze (<100 lines)"
)

Registration

Add to src/agentready/services/scanner.py in create_all_assessors():

from ..assessors.code_quality import (
    TypeAnnotationsAssessor,
    CyclomaticComplexityAssessor,
    StructuredLoggingAssessor,
    SemanticNamingAssessor,
    CodeSmellsAssessor,  # Add this import
)

def create_all_assessors() -> List[BaseAssessor]:
    return [
        # ... existing assessors ...
        CodeSmellsAssessor(),  # Add this line
    ]

Testing Guidance

Test File: tests/unit/test_assessors_code_quality.py

Test Cases to Add:

  1. test_code_smells_pass_clean_code: Repository with minimal smells
  2. test_code_smells_fail_long_methods: Repository with many long methods
  3. test_code_smells_fail_large_classes: Repository with large classes
  4. test_code_smells_partial_score: Mixed quality (some smells present)
  5. test_code_smells_not_applicable: Very small codebase

Note: AgentReady has mostly small, focused modules, should score well (80+).

Dependencies

External Tools: Optional (pylint, jscpd for advanced detection)

Python Standard Library:

  • ast for parsing Python code
  • re for detecting magic numbers
  • pathlib.Path for file iteration

Optional Enhancement: Use pylint for comprehensive smell detection

Remediation Steps

def _create_remediation(self) -> Remediation:
    return Remediation(
        summary="Refactor code to eliminate common code smells",
        steps=[
            "Break down long methods into smaller, focused functions",
            "Extract large classes into multiple cohesive classes",
            "Reduce parameter lists using objects or builder pattern",
            "Extract magic numbers into named constants",
            "Eliminate duplicate code through helper functions",
            "Use linters to enforce code quality rules",
        ],
        tools=["pylint", "eslint", "sonarqube", "jscpd"],
        commands=[
            "# Python - Detect code smells with pylint",
            "pylint --disable=all --enable=too-many-lines,too-many-arguments,duplicate-code src/",
            "",
            "# Find long methods",
            "radon cc src/ -s -n D  # D = complex (high smell indicator)",
            "",
            "# Detect duplicate code",
            "pip install jscpd",
            "jscpd src/",
        ],
        examples=[
            """# Long method - Bad
def process_order(order_id, customer_id, items, payment_method,
                  shipping_address, billing_address, discount_code,
                  gift_wrap, notes):
    '''50+ lines of processing logic'''
    # Validate customer
    # Validate items
    # Calculate total
    # Apply discount
    # Process payment
    # Create shipment
    # Send notifications
    # Update inventory
    # [... 40 more lines ...]

# Refactored - Good
def process_order(order: Order) -> OrderResult:
    '''Orchestrate order processing.'''
    validate_order(order)
    total = calculate_order_total(order)
    payment = process_payment(order, total)
    shipment = create_shipment(order)
    send_order_notifications(order)
    return OrderResult(payment, shipment)
""",
            """# Magic numbers - Bad
def calculate_shipping(weight):
    if weight < 5:
        return weight * 2.5
    elif weight < 20:
        return weight * 1.8
    else:
        return weight * 1.2 + 10

# Constants - Good
LIGHT_WEIGHT_THRESHOLD = 5
MEDIUM_WEIGHT_THRESHOLD = 20
LIGHT_SHIPPING_RATE = 2.5
MEDIUM_SHIPPING_RATE = 1.8
HEAVY_SHIPPING_RATE = 1.2
HEAVY_SHIPPING_SURCHARGE = 10

def calculate_shipping(weight):
    if weight < LIGHT_WEIGHT_THRESHOLD:
        return weight * LIGHT_SHIPPING_RATE
    elif weight < MEDIUM_WEIGHT_THRESHOLD:
        return weight * MEDIUM_SHIPPING_RATE
    else:
        return weight * HEAVY_SHIPPING_RATE + HEAVY_SHIPPING_SURCHARGE
""",
            """# Long parameter list - Bad
def create_user(first_name, last_name, email, phone, address_line1,
                address_line2, city, state, zip_code, country):
    # ...

# Object parameter - Good
@dataclass
class UserData:
    first_name: str
    last_name: str
    email: str
    phone: str
    address: Address

def create_user(user_data: UserData) -> User:
    # ...
""",
        ],
        citations=[
            Citation(
                source="ScienceDirect",
                title="Code smells and refactoring: A tertiary systematic review",
                url="https://www.sciencedirect.com/science/article/pii/S0164121221000819",
                relevance="Academic research on code smells",
            ),
            Citation(
                source="Refactoring Guru",
                title="Code Smells Catalog",
                url="https://refactoring.guru/refactoring/smells",
                relevance="Comprehensive catalog of code smells and solutions",
            ),
        ],
    )

Implementation Notes

  1. AST Parsing: Use ast module to count lines in functions, classes
  2. Long Methods: Count statements or lines within function bodies
  3. Large Classes: Count total lines in class definition
  4. Parameter Counting: Parse function signatures for parameter count
  5. Magic Numbers: Regex to find numeric literals not assigned to constants
  6. Duplicate Detection: Advanced - use AST hashing, simple - skip or basic heuristic
  7. Scoring: Proportional scoring based on percentage of violations
  8. Edge Cases: Ignore test files (may have longer methods), configuration files
  9. Performance: Sample files for large repositories instead of full scan

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions