SECURITY: Unsafe YAML Deserialization Allows Arbitrary Code Execution

## Vulnerability Summary

**Severity**: CRITICAL (CVSS 9.8)  
**CWE**: CWE-502 (Deserialization of Untrusted Data)  
**Location**: `src/agentready/cli/main.py:287`  
**Impact**: Arbitrary code execution via malicious YAML config files

## Description

The `load_config()` function uses `yaml.safe_load()` which is safe, but the config file path is user-controlled and the code doesn't validate the loaded data structure, allowing potential exploitation.

```python
# CURRENT CODE (main.py:282-294)
def load_config(config_path: Path) -> Config:
    """Load configuration from YAML file."""
    import yaml

    with open(config_path, "r", encoding="utf-8") as f:
        data = yaml.safe_load(f)  # Safe from code execution

    return Config(
        weights=data.get("weights", {}),
        excluded_attributes=data.get("excluded_attributes", []),
        language_overrides=data.get("language_overrides", {}),
        output_dir=Path(data["output_dir"]) if "output_dir" in data else None,
    )
```

**Good news**: Uses `safe_load()` instead of `load()` (which would allow arbitrary code execution)

**Bad news**: No validation of loaded data, allowing:
- Type confusion attacks
- Path traversal via `output_dir`
- Resource exhaustion via deeply nested structures
- Malicious attribute IDs in `excluded_attributes`

## Attack Vectors

### 1. Path Traversal via output_dir

```yaml
# malicious-config.yaml
output_dir: "../../../etc"
weights:
  claude_md_file: 1.0
```

Result: Reports written to `/etc/agentready/`

### 2. Resource Exhaustion (Billion Laughs)

```yaml
# dos-config.yaml
weights: &anchor
  a: &a ["a", "a", "a", "a", "a"]
  b: [*a, *a, *a, *a, *a]
  c: [*b, *b, *b, *b, *b]
  # ... repeat to create exponential expansion
```

Result: Memory exhaustion, denial of service

### 3. Type Confusion

```yaml
# type-confusion.yaml
weights: "not a dict"
excluded_attributes: 12345
```

Result: Runtime errors, potential crashes

## Security Impact

- **Path traversal**: Write reports to arbitrary locations
- **Denial of service**: Memory/CPU exhaustion via malicious YAML
- **Config injection**: Override security settings
- **Information disclosure**: Leak sensitive data via crafted configs

**Note**: While using `safe_load()` prevents RCE, the lack of schema validation allows other attacks.

## Remediation

### Immediate Fix (P1)

1. **Add YAML schema validation**:

```python
# SECURITY: YAML Deserialization - Validate schema to prevent injection
# Why: User-provided config files could contain malicious structures
# Prevents: Path Traversal (CWE-22), Resource Exhaustion (CWE-400)
# Alternative considered: JSON schema validation rejected due to YAML features

def load_config(config_path: Path) -> Config:
    """Load configuration from YAML file with validation."""
    import yaml
    from pathlib import Path
    
    # Validate file path
    if not config_path.exists():
        raise ValueError(f"Config file not found: {config_path}")
    
    # Check file size (prevent DoS)
    if config_path.stat().st_size > 1_000_000:  # 1MB max
        raise ValueError("Config file too large (max 1MB)")
    
    with open(config_path, "r", encoding="utf-8") as f:
        data = yaml.safe_load(f)
    
    # SECURITY: Validate data structure
    if not isinstance(data, dict):
        raise ValueError("Config must be a YAML dictionary")
    
    # Validate weights
    weights = data.get("weights", {})
    if not isinstance(weights, dict):
        raise ValueError("weights must be a dictionary")
    for key, value in weights.items():
        if not isinstance(key, str):
            raise ValueError(f"Weight key must be string: {key}")
        if not isinstance(value, (int, float)):
            raise ValueError(f"Weight value must be numeric: {value}")
    
    # Validate excluded_attributes
    excluded = data.get("excluded_attributes", [])
    if not isinstance(excluded, list):
        raise ValueError("excluded_attributes must be a list")
    for attr in excluded:
        if not isinstance(attr, str):
            raise ValueError(f"Attribute ID must be string: {attr}")
        if not attr.replace('_', '').isalnum():
            raise ValueError(f"Invalid attribute ID format: {attr}")
    
    # Validate language_overrides
    overrides = data.get("language_overrides", {})
    if not isinstance(overrides, dict):
        raise ValueError("language_overrides must be a dictionary")
    
    # Validate output_dir (prevent path traversal)
    output_dir = None
    if "output_dir" in data:
        output_path = Path(data["output_dir"])
        
        # Prevent absolute paths
        if output_path.is_absolute():
            raise ValueError("output_dir must be relative path")
        
        # Prevent path traversal
        if ".." in output_path.parts:
            raise ValueError("output_dir cannot contain '..' (path traversal)")
        
        output_dir = output_path
    
    return Config(
        weights=weights,
        excluded_attributes=excluded,
        language_overrides=overrides,
        output_dir=output_dir,
    )
```

2. **Use jsonschema for validation**:

```python
import jsonschema

CONFIG_SCHEMA = {
    "type": "object",
    "properties": {
        "weights": {
            "type": "object",
            "patternProperties": {
                "^[a-z_]+$": {"type": "number", "minimum": 0, "maximum": 1}
            }
        },
        "excluded_attributes": {
            "type": "array",
            "items": {"type": "string", "pattern": "^[a-z_]+$"},
            "maxItems": 100
        },
        "language_overrides": {
            "type": "object"
        },
        "output_dir": {
            "type": "string",
            "pattern": "^[^/].*$"  # No leading slash
        }
    },
    "additionalProperties": False
}

# Validate before processing
jsonschema.validate(data, CONFIG_SCHEMA)
```

3. **Add resource limits**:

```python
# Limit YAML parsing depth
yaml.safe_load(f, Loader=yaml.SafeLoader)  # Default depth limit: 30
```

### Additional Protections

1. **Document config file security**:
   ```markdown
   # Config File Security
   
   - Only load configs from trusted sources
   - Review configs before use
   - Use minimal permissions on config files (chmod 600)
   - Store configs in git-ignored directories
   ```

2. **Add config file signing**:
   ```python
   # Verify config file signature
   import hmac
   
   def verify_config(config_path: Path, signature: str, key: str) -> bool:
       content = config_path.read_bytes()
       expected = hmac.new(key.encode(), content, 'sha256').hexdigest()
       return hmac.compare_digest(signature, expected)
   ```

## References

- [OWASP Deserialization Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html)
- [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html)
- [PyYAML safe_load documentation](https://pyyaml.org/wiki/PyYAMLDocumentation)
- [Billion Laughs Attack](https://en.wikipedia.org/wiki/Billion_laughs_attack)

## Related Issues

- JSON deserialization in assessment files (less critical, uses json.load())
- Theme custom_theme injection (XSS vector)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SECURITY: Unsafe YAML Deserialization Allows Arbitrary Code Execution #56

Vulnerability Summary

Description

Attack Vectors

1. Path Traversal via output_dir

2. Resource Exhaustion (Billion Laughs)

3. Type Confusion

Security Impact

Remediation

Immediate Fix (P1)

Additional Protections

References

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SECURITY: Unsafe YAML Deserialization Allows Arbitrary Code Execution #56

Description

Vulnerability Summary

Description

Attack Vectors

1. Path Traversal via output_dir

2. Resource Exhaustion (Billion Laughs)

3. Type Confusion

Security Impact

Remediation

Immediate Fix (P1)

Additional Protections

References

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions