-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Vulnerability Summary
Severity: CRITICAL (CVSS 9.8)
CWE: CWE-502 (Deserialization of Untrusted Data)
Location: src/agentready/cli/main.py:287
Impact: Arbitrary code execution via malicious YAML config files
Description
The load_config() function uses yaml.safe_load() which is safe, but the config file path is user-controlled and the code doesn't validate the loaded data structure, allowing potential exploitation.
# CURRENT CODE (main.py:282-294)
def load_config(config_path: Path) -> Config:
"""Load configuration from YAML file."""
import yaml
with open(config_path, "r", encoding="utf-8") as f:
data = yaml.safe_load(f) # Safe from code execution
return Config(
weights=data.get("weights", {}),
excluded_attributes=data.get("excluded_attributes", []),
language_overrides=data.get("language_overrides", {}),
output_dir=Path(data["output_dir"]) if "output_dir" in data else None,
)Good news: Uses safe_load() instead of load() (which would allow arbitrary code execution)
Bad news: No validation of loaded data, allowing:
- Type confusion attacks
- Path traversal via
output_dir - Resource exhaustion via deeply nested structures
- Malicious attribute IDs in
excluded_attributes
Attack Vectors
1. Path Traversal via output_dir
# malicious-config.yaml
output_dir: "../../../etc"
weights:
claude_md_file: 1.0Result: Reports written to /etc/agentready/
2. Resource Exhaustion (Billion Laughs)
# dos-config.yaml
weights: &anchor
a: &a ["a", "a", "a", "a", "a"]
b: [*a, *a, *a, *a, *a]
c: [*b, *b, *b, *b, *b]
# ... repeat to create exponential expansionResult: Memory exhaustion, denial of service
3. Type Confusion
# type-confusion.yaml
weights: "not a dict"
excluded_attributes: 12345Result: Runtime errors, potential crashes
Security Impact
- Path traversal: Write reports to arbitrary locations
- Denial of service: Memory/CPU exhaustion via malicious YAML
- Config injection: Override security settings
- Information disclosure: Leak sensitive data via crafted configs
Note: While using safe_load() prevents RCE, the lack of schema validation allows other attacks.
Remediation
Immediate Fix (P1)
- Add YAML schema validation:
# SECURITY: YAML Deserialization - Validate schema to prevent injection
# Why: User-provided config files could contain malicious structures
# Prevents: Path Traversal (CWE-22), Resource Exhaustion (CWE-400)
# Alternative considered: JSON schema validation rejected due to YAML features
def load_config(config_path: Path) -> Config:
"""Load configuration from YAML file with validation."""
import yaml
from pathlib import Path
# Validate file path
if not config_path.exists():
raise ValueError(f"Config file not found: {config_path}")
# Check file size (prevent DoS)
if config_path.stat().st_size > 1_000_000: # 1MB max
raise ValueError("Config file too large (max 1MB)")
with open(config_path, "r", encoding="utf-8") as f:
data = yaml.safe_load(f)
# SECURITY: Validate data structure
if not isinstance(data, dict):
raise ValueError("Config must be a YAML dictionary")
# Validate weights
weights = data.get("weights", {})
if not isinstance(weights, dict):
raise ValueError("weights must be a dictionary")
for key, value in weights.items():
if not isinstance(key, str):
raise ValueError(f"Weight key must be string: {key}")
if not isinstance(value, (int, float)):
raise ValueError(f"Weight value must be numeric: {value}")
# Validate excluded_attributes
excluded = data.get("excluded_attributes", [])
if not isinstance(excluded, list):
raise ValueError("excluded_attributes must be a list")
for attr in excluded:
if not isinstance(attr, str):
raise ValueError(f"Attribute ID must be string: {attr}")
if not attr.replace('_', '').isalnum():
raise ValueError(f"Invalid attribute ID format: {attr}")
# Validate language_overrides
overrides = data.get("language_overrides", {})
if not isinstance(overrides, dict):
raise ValueError("language_overrides must be a dictionary")
# Validate output_dir (prevent path traversal)
output_dir = None
if "output_dir" in data:
output_path = Path(data["output_dir"])
# Prevent absolute paths
if output_path.is_absolute():
raise ValueError("output_dir must be relative path")
# Prevent path traversal
if ".." in output_path.parts:
raise ValueError("output_dir cannot contain '..' (path traversal)")
output_dir = output_path
return Config(
weights=weights,
excluded_attributes=excluded,
language_overrides=overrides,
output_dir=output_dir,
)- Use jsonschema for validation:
import jsonschema
CONFIG_SCHEMA = {
"type": "object",
"properties": {
"weights": {
"type": "object",
"patternProperties": {
"^[a-z_]+$": {"type": "number", "minimum": 0, "maximum": 1}
}
},
"excluded_attributes": {
"type": "array",
"items": {"type": "string", "pattern": "^[a-z_]+$"},
"maxItems": 100
},
"language_overrides": {
"type": "object"
},
"output_dir": {
"type": "string",
"pattern": "^[^/].*$" # No leading slash
}
},
"additionalProperties": False
}
# Validate before processing
jsonschema.validate(data, CONFIG_SCHEMA)- Add resource limits:
# Limit YAML parsing depth
yaml.safe_load(f, Loader=yaml.SafeLoader) # Default depth limit: 30Additional Protections
-
Document config file security:
# Config File Security - Only load configs from trusted sources - Review configs before use - Use minimal permissions on config files (chmod 600) - Store configs in git-ignored directories
-
Add config file signing:
# Verify config file signature import hmac def verify_config(config_path: Path, signature: str, key: str) -> bool: content = config_path.read_bytes() expected = hmac.new(key.encode(), content, 'sha256').hexdigest() return hmac.compare_digest(signature, expected)
References
- OWASP Deserialization Cheat Sheet
- CWE-502: Deserialization of Untrusted Data
- PyYAML safe_load documentation
- Billion Laughs Attack
Related Issues
- JSON deserialization in assessment files (less critical, uses json.load())
- Theme custom_theme injection (XSS vector)