-
Notifications
You must be signed in to change notification settings - Fork 53
Description
PDD-CLI Bug: Generates field references without schema validation
When generating code that accesses object fields/properties, PDD-CLI makes assumptions about field names based on context clues (variable names, class names) rather than validating against the actual data model schema. This results in code that accesses non-existent fields or uses wrong field names.
Why this matters: Generated code compiles but fails at runtime or returns incorrect data because the field names don't match the actual model definition.
Concrete Example
Imagine you have a data model defined like this:
# Model definition
class ContactFunnelStage:
name: str # ← The field is called "name"
priority: int
created_at: datetimeWhen PDD-CLI generates code to access this model, it might produce:
# PDD generated this (WRONG):
stage_name = contact.get("stage", "Unknown") # ← Accessing "stage" field
# Should have generated (CORRECT):
stage_name = contact.get("name", "Unknown") # ← Accessing "name" fieldWhat went wrong: The variable was named stage_name and the object was a ContactFunnelStage, so PDD likely inferred the field should be called stage. But the actual model defines the field as name, not stage.
Impact: In this case, the code always returned "Unknown" instead of the actual stage name because the field "stage" doesn't exist.
Why PDD Makes This Mistake
PDD-CLI currently uses contextual inference to generate field access:
- Looks at variable names ("
stage_name" → must access field "stage") - Looks at class names ("
ContactFunnelStage" → fields probably relate to "stage") - Makes educated guesses based on common patterns
But it doesn't:
- Parse the actual model definition to see what fields exist
- Validate that the referenced field name is actually defined
- Check Pydantic models, TypeScript interfaces, or class definitions
- Cross-reference field access against known schemas
How to Prevent This in PDD-CLI
What PDD should do differently:
-
Schema introspection before generation: Before generating field access code, PDD should parse the target model definition
- For Python: Read Pydantic model fields using
model_fieldsor parse class definitions - For TypeScript: Parse
interfaceortypedefinitions from source files
- For Python: Read Pydantic model fields using
-
Validate field references: Only generate field access for fields that actually exist in the model
-
Add validation phase: After generation, run type checkers (mypy, TypeScript compiler) to catch field mismatches
Example improvement:
User request: "Calculate stage names from ContactFunnelStage objects"
Current PDD flow:
1. Infer field name from context → generate `contact.get("stage")`
Improved PDD flow:
1. Parse ContactFunnelStage model definition
2. See available fields: ["name", "priority", "created_at"]
3. Generate validated field access → `contact.get("name")`
Severity
P1 - High Priority
- Frequency: High (affects any code accessing nested object fields)
- Impact: Runtime bugs that silently return incorrect data
- Detectability: Low (code runs without errors but produces wrong results)
- Prevention cost: Low (schema validation is straightforward)
Category
schema-validation
Related Issues
- Generated tests use incorrect sys.modules paths for mocking #412 - Constructor arguments not validated (same root cause: missing schema validation)
- Add failing tests for #430: auto-fix fingerprint skip bug #432 - Data models created without extraction logic
- Add failing tests for #393: format injection at step 5.5 #435 - Incomplete metric calculations (similar inference-based generation)
For Contributors: This issue was discovered during generation of a CRM application test case. The specific file was backend/functions/admin_crm_actions.py:207 and was fixed in commit 34a651d5. While the example uses Python/Pydantic, the same issue applies to any language where PDD generates field/property access (TypeScript interfaces, Java classes, etc.).