Skip to content

PDD-CLI Bug: Generates field references without schema validation #564

@jiaminc-cmu

Description

@jiaminc-cmu

PDD-CLI Bug: Generates field references without schema validation

When generating code that accesses object fields/properties, PDD-CLI makes assumptions about field names based on context clues (variable names, class names) rather than validating against the actual data model schema. This results in code that accesses non-existent fields or uses wrong field names.

Why this matters: Generated code compiles but fails at runtime or returns incorrect data because the field names don't match the actual model definition.

Concrete Example

Imagine you have a data model defined like this:

# Model definition
class ContactFunnelStage:
    name: str           # ← The field is called "name"
    priority: int
    created_at: datetime

When PDD-CLI generates code to access this model, it might produce:

# PDD generated this (WRONG):
stage_name = contact.get("stage", "Unknown")  # ← Accessing "stage" field

# Should have generated (CORRECT):
stage_name = contact.get("name", "Unknown")   # ← Accessing "name" field

What went wrong: The variable was named stage_name and the object was a ContactFunnelStage, so PDD likely inferred the field should be called stage. But the actual model defines the field as name, not stage.

Impact: In this case, the code always returned "Unknown" instead of the actual stage name because the field "stage" doesn't exist.

Why PDD Makes This Mistake

PDD-CLI currently uses contextual inference to generate field access:

  • Looks at variable names ("stage_name" → must access field "stage")
  • Looks at class names ("ContactFunnelStage" → fields probably relate to "stage")
  • Makes educated guesses based on common patterns

But it doesn't:

  1. Parse the actual model definition to see what fields exist
  2. Validate that the referenced field name is actually defined
  3. Check Pydantic models, TypeScript interfaces, or class definitions
  4. Cross-reference field access against known schemas

How to Prevent This in PDD-CLI

What PDD should do differently:

  1. Schema introspection before generation: Before generating field access code, PDD should parse the target model definition

    • For Python: Read Pydantic model fields using model_fields or parse class definitions
    • For TypeScript: Parse interface or type definitions from source files
  2. Validate field references: Only generate field access for fields that actually exist in the model

  3. Add validation phase: After generation, run type checkers (mypy, TypeScript compiler) to catch field mismatches

Example improvement:

User request: "Calculate stage names from ContactFunnelStage objects"

Current PDD flow:
1. Infer field name from context → generate `contact.get("stage")`

Improved PDD flow:
1. Parse ContactFunnelStage model definition
2. See available fields: ["name", "priority", "created_at"]
3. Generate validated field access → `contact.get("name")`

Severity

P1 - High Priority

  • Frequency: High (affects any code accessing nested object fields)
  • Impact: Runtime bugs that silently return incorrect data
  • Detectability: Low (code runs without errors but produces wrong results)
  • Prevention cost: Low (schema validation is straightforward)

Category

schema-validation

Related Issues


For Contributors: This issue was discovered during generation of a CRM application test case. The specific file was backend/functions/admin_crm_actions.py:207 and was fixed in commit 34a651d5. While the example uses Python/Pydantic, the same issue applies to any language where PDD generates field/property access (TypeScript interfaces, Java classes, etc.).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions