Skip to content

SECURITY: Unsafe YAML Deserialization Allows Arbitrary Code Execution #56

@jeremyeder

Description

@jeremyeder

Vulnerability Summary

Severity: CRITICAL (CVSS 9.8)
CWE: CWE-502 (Deserialization of Untrusted Data)
Location: src/agentready/cli/main.py:287
Impact: Arbitrary code execution via malicious YAML config files

Description

The load_config() function uses yaml.safe_load() which is safe, but the config file path is user-controlled and the code doesn't validate the loaded data structure, allowing potential exploitation.

# CURRENT CODE (main.py:282-294)
def load_config(config_path: Path) -> Config:
    """Load configuration from YAML file."""
    import yaml

    with open(config_path, "r", encoding="utf-8") as f:
        data = yaml.safe_load(f)  # Safe from code execution

    return Config(
        weights=data.get("weights", {}),
        excluded_attributes=data.get("excluded_attributes", []),
        language_overrides=data.get("language_overrides", {}),
        output_dir=Path(data["output_dir"]) if "output_dir" in data else None,
    )

Good news: Uses safe_load() instead of load() (which would allow arbitrary code execution)

Bad news: No validation of loaded data, allowing:

  • Type confusion attacks
  • Path traversal via output_dir
  • Resource exhaustion via deeply nested structures
  • Malicious attribute IDs in excluded_attributes

Attack Vectors

1. Path Traversal via output_dir

# malicious-config.yaml
output_dir: "../../../etc"
weights:
  claude_md_file: 1.0

Result: Reports written to /etc/agentready/

2. Resource Exhaustion (Billion Laughs)

# dos-config.yaml
weights: &anchor
  a: &a ["a", "a", "a", "a", "a"]
  b: [*a, *a, *a, *a, *a]
  c: [*b, *b, *b, *b, *b]
  # ... repeat to create exponential expansion

Result: Memory exhaustion, denial of service

3. Type Confusion

# type-confusion.yaml
weights: "not a dict"
excluded_attributes: 12345

Result: Runtime errors, potential crashes

Security Impact

  • Path traversal: Write reports to arbitrary locations
  • Denial of service: Memory/CPU exhaustion via malicious YAML
  • Config injection: Override security settings
  • Information disclosure: Leak sensitive data via crafted configs

Note: While using safe_load() prevents RCE, the lack of schema validation allows other attacks.

Remediation

Immediate Fix (P1)

  1. Add YAML schema validation:
# SECURITY: YAML Deserialization - Validate schema to prevent injection
# Why: User-provided config files could contain malicious structures
# Prevents: Path Traversal (CWE-22), Resource Exhaustion (CWE-400)
# Alternative considered: JSON schema validation rejected due to YAML features

def load_config(config_path: Path) -> Config:
    """Load configuration from YAML file with validation."""
    import yaml
    from pathlib import Path
    
    # Validate file path
    if not config_path.exists():
        raise ValueError(f"Config file not found: {config_path}")
    
    # Check file size (prevent DoS)
    if config_path.stat().st_size > 1_000_000:  # 1MB max
        raise ValueError("Config file too large (max 1MB)")
    
    with open(config_path, "r", encoding="utf-8") as f:
        data = yaml.safe_load(f)
    
    # SECURITY: Validate data structure
    if not isinstance(data, dict):
        raise ValueError("Config must be a YAML dictionary")
    
    # Validate weights
    weights = data.get("weights", {})
    if not isinstance(weights, dict):
        raise ValueError("weights must be a dictionary")
    for key, value in weights.items():
        if not isinstance(key, str):
            raise ValueError(f"Weight key must be string: {key}")
        if not isinstance(value, (int, float)):
            raise ValueError(f"Weight value must be numeric: {value}")
    
    # Validate excluded_attributes
    excluded = data.get("excluded_attributes", [])
    if not isinstance(excluded, list):
        raise ValueError("excluded_attributes must be a list")
    for attr in excluded:
        if not isinstance(attr, str):
            raise ValueError(f"Attribute ID must be string: {attr}")
        if not attr.replace('_', '').isalnum():
            raise ValueError(f"Invalid attribute ID format: {attr}")
    
    # Validate language_overrides
    overrides = data.get("language_overrides", {})
    if not isinstance(overrides, dict):
        raise ValueError("language_overrides must be a dictionary")
    
    # Validate output_dir (prevent path traversal)
    output_dir = None
    if "output_dir" in data:
        output_path = Path(data["output_dir"])
        
        # Prevent absolute paths
        if output_path.is_absolute():
            raise ValueError("output_dir must be relative path")
        
        # Prevent path traversal
        if ".." in output_path.parts:
            raise ValueError("output_dir cannot contain '..' (path traversal)")
        
        output_dir = output_path
    
    return Config(
        weights=weights,
        excluded_attributes=excluded,
        language_overrides=overrides,
        output_dir=output_dir,
    )
  1. Use jsonschema for validation:
import jsonschema

CONFIG_SCHEMA = {
    "type": "object",
    "properties": {
        "weights": {
            "type": "object",
            "patternProperties": {
                "^[a-z_]+$": {"type": "number", "minimum": 0, "maximum": 1}
            }
        },
        "excluded_attributes": {
            "type": "array",
            "items": {"type": "string", "pattern": "^[a-z_]+$"},
            "maxItems": 100
        },
        "language_overrides": {
            "type": "object"
        },
        "output_dir": {
            "type": "string",
            "pattern": "^[^/].*$"  # No leading slash
        }
    },
    "additionalProperties": False
}

# Validate before processing
jsonschema.validate(data, CONFIG_SCHEMA)
  1. Add resource limits:
# Limit YAML parsing depth
yaml.safe_load(f, Loader=yaml.SafeLoader)  # Default depth limit: 30

Additional Protections

  1. Document config file security:

    # Config File Security
    
    - Only load configs from trusted sources
    - Review configs before use
    - Use minimal permissions on config files (chmod 600)
    - Store configs in git-ignored directories
  2. Add config file signing:

    # Verify config file signature
    import hmac
    
    def verify_config(config_path: Path, signature: str, key: str) -> bool:
        content = config_path.read_bytes()
        expected = hmac.new(key.encode(), content, 'sha256').hexdigest()
        return hmac.compare_digest(signature, expected)

References

Related Issues

  • JSON deserialization in assessment files (less critical, uses json.load())
  • Theme custom_theme injection (XSS vector)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions