Skip to content

SECURITY: Insufficient Input Validation in Subprocess Calls Across Codebase #57

@jeremyeder

Description

@jeremyeder

Vulnerability Summary

Severity: MEDIUM-HIGH (CVSS 6.8)
CWE: CWE-78 (OS Command Injection)
Locations: Multiple files using subprocess
Impact: Potential command injection and resource exhaustion

Description

Multiple subprocess calls across the codebase lack proper input validation and safeguards:

Vulnerable Locations

  1. language_detector.py:78,121,147 - git ls-files without timeout validation
  2. repomix.py:217 - External command with user-controlled paths
  3. code_quality.py:74,298,354 - git ls-files, radon, lizard with file patterns
  4. main.py:159 - git ls-files for file counting

Common Issues

  • Missing timeouts: Some calls have timeout=30, others don't
  • No path validation: Repository paths passed to subprocess without sanitization
  • No output size limits: Large repositories could cause memory exhaustion
  • Error exposure: Stack traces may leak sensitive paths

Vulnerability Analysis

# language_detector.py:78-86 (HAS timeout=30)
result = subprocess.run(
    ["git", "ls-files"],
    cwd=self.repository_path,  # User-controlled
    capture_output=True,
    text=True,
    timeout=30,  # GOOD: Has timeout
    check=True,
)

# main.py:159-167 (NO timeout on fast count)
result = subprocess.run(
    ["git", "ls-files"],
    cwd=repo_path,  # User-controlled
    capture_output=True,
    text=True,
    timeout=5,  # GOOD: Has timeout (but only 5s)
)

# repomix.py:217-218 (NO timeout!)
result = subprocess.run(
    cmd,  # ["repomix", "--style", format, "--output", file]
    cwd=self.repo_path,  # User-controlled
    capture_output=True,
    text=True,
    check=False,  # NO TIMEOUT - Vulnerable to DoS
)

Attack Vectors

1. Resource Exhaustion via Large Repositories

# Create repo with millions of files
for i in {1..1000000}; do touch "file$i.txt"; done
git add .

# Run AgentReady assessment
agentready assess .  # May hang or exhaust memory

2. Path Injection (Limited)

While paths are passed as cwd parameter (safe), symbolic links could redirect execution:

# Create symlink to sensitive directory
ln -s /etc/shadow .git

# Assessment may try to process sensitive files
agentready assess .

3. Command Injection via Filenames

# Create files with shell metacharacters
touch "file; rm -rf /.txt"
git add "file; rm -rf /.txt"

# If git output is ever passed to shell=True, code execution occurs

Security Impact

  • Denial of Service: Hang indefinitely on large repos
  • Memory exhaustion: Capture unbounded subprocess output
  • Information disclosure: Error messages expose system paths
  • Potential command injection: If output reused unsafely

Remediation

Immediate Fix (P1)

  1. Add consistent timeouts:
# SECURITY: Subprocess Execution - Always set timeouts to prevent DoS
# Why: Large repositories or slow commands could hang indefinitely
# Prevents: Resource Exhaustion (CWE-400)

SUBPROCESS_TIMEOUT = 120  # 2 minutes max for any subprocess

def safe_subprocess_run(cmd: list[str], **kwargs) -> subprocess.CompletedProcess:
    """Run subprocess with security guardrails."""
    # SECURITY: Enforce timeout on all subprocess calls
    if 'timeout' not in kwargs:
        kwargs['timeout'] = SUBPROCESS_TIMEOUT
    
    # SECURITY: Limit output size to prevent memory exhaustion
    try:
        result = subprocess.run(cmd, **kwargs)
        
        # Check output size (1MB limit)
        if result.stdout and len(result.stdout) > 1_000_000:
            raise ValueError("Subprocess output too large (>1MB)")
        
        return result
    
    except subprocess.TimeoutExpired as e:
        logger.error(f"Subprocess timeout: {cmd[0]}")
        raise
  1. Validate repository paths:
# SECURITY: Path Validation - Prevent symlink attacks
# Why: Repository path could be symlink to sensitive directory
# Prevents: Information Disclosure (CWE-200)

def validate_repository_path(path: Path) -> Path:
    """Validate and resolve repository path safely."""
    # Resolve symlinks
    resolved = path.resolve()
    
    # Prevent access to sensitive directories
    FORBIDDEN_PATHS = ['/etc', '/sys', '/proc', '/dev', '/.ssh', '/root']
    for forbidden in FORBIDDEN_PATHS:
        if str(resolved).startswith(forbidden):
            raise ValueError(f"Cannot assess sensitive directory: {resolved}")
    
    # Ensure it's actually a git repo
    if not (resolved / '.git').exists():
        raise ValueError(f"Not a git repository: {resolved}")
    
    return resolved
  1. Sanitize subprocess output:
# SECURITY: Output Sanitization - Redact sensitive paths
# Why: Error messages may contain sensitive system information
# Prevents: Information Disclosure (CWE-200)

def sanitize_subprocess_error(error: Exception, repo_path: Path) -> str:
    """Sanitize error message to prevent info leakage."""
    msg = str(error)
    
    # Redact absolute paths
    msg = msg.replace(str(repo_path.resolve()), '<repo>')
    msg = msg.replace(str(Path.home()), '<home>')
    
    # Redact usernames
    import getpass
    msg = msg.replace(getpass.getuser(), '<user>')
    
    return msg
  1. Update all subprocess calls:
# Update repomix.py:217
result = safe_subprocess_run(
    cmd,
    cwd=validate_repository_path(self.repo_path),
    capture_output=True,
    text=True,
    check=False,
    timeout=300,  # Repomix can be slow on large repos
)

# Update code_quality.py:74
result = safe_subprocess_run(
    ["git", "ls-files", "*.py"],
    cwd=validate_repository_path(repository.path),
    capture_output=True,
    text=True,
    check=True,
)

Additional Protections

  1. Add subprocess monitoring:

    # Track subprocess execution for security audit
    logger.debug(f"Executing: {' '.join(cmd)} in {cwd}")
    start_time = time.time()
    result = subprocess.run(...)
    logger.debug(f"Completed in {time.time() - start_time:.1f}s")
  2. Implement rate limiting:

    # Limit number of subprocess calls per assessment
    MAX_SUBPROCESS_CALLS = 100
    subprocess_count = 0
    
    def safe_subprocess_run(...):
        global subprocess_count
        subprocess_count += 1
        if subprocess_count > MAX_SUBPROCESS_CALLS:
            raise RuntimeError("Subprocess limit exceeded")
        # ... rest of function

References

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions