-
Notifications
You must be signed in to change notification settings - Fork 53
Description
PDD-CLI Bug: Generates Data Models Without Extraction/Parsing Logic
PDD-CLI creates data model classes with field definitions, but doesn't implement the logic to extract and populate those fields from actual data sources. Models exist but remain empty because data isn't parsed.
Why this matters: Generated models are unusable - all fields remain None or default values because nothing populates them from source data.
Concrete Example
For a contact management system:
# PDD generated model (INCOMPLETE):
# models/contact.py
from pydantic import BaseModel
class Contact(BaseModel):
email: str
name: str
company: Optional[str] = None
labels: List[str] = []But no extraction logic:
# handlers/create_contact.py
def create_contact(issue_body: str) -> Contact:
# PDD generated this - but how to extract fields?
contact = Contact(email="???", name="???") # ← No parsing implemented!
return contactWhat went wrong: PDD defined the model structure but didn't implement parsing logic to extract email, name, company from the GitHub issue body format.
Impact: All contacts created with placeholder data, actual data from issues never extracted.
Why PDD Makes This Mistake
PDD-CLI currently:
- Generates data structures (models) separately from data pipelines (extraction)
- Defines "what" without implementing "how"
- Assumes extraction logic will be added later
But it should:
- Generate complete data pipeline: parse → validate → transform → store
- Implement extraction logic for defined fields
- Handle parsing failures gracefully
How to Prevent This in PDD-CLI
What PDD should do differently:
-
Generate complete data pipeline:
def parse_contact_from_issue(issue_body: str) -> Contact: """Extract contact fields from GitHub issue body.""" import re # Extract email email_match = re.search(r'Email:\s*(\S+@\S+)', issue_body) email = email_match.group(1) if email_match else None # Extract name name_match = re.search(r'Name:\s*(.+)', issue_body) name = name_match.group(1).strip() if name_match else None # Extract company company_match = re.search(r'Company:\s*(.+)', issue_body) company = company_match.group(1).strip() if company_match else None if not email or not name: raise ValueError("Missing required fields") return Contact(email=email, name=name, company=company)
-
Generate validation and error handling: Handle malformed input gracefully.
-
Generate tests for extraction: Ensure parsing works correctly.
Example improvement:
Current: "Create Contact model"
→ Generate Contact class
→ No extraction logic
→ Fields never populated
Improved: "Create Contact model"
→ Generate Contact class
→ Generate parse_contact_from_issue()
→ Generate validation logic
→ Generate tests with sample data
→ Complete, working pipeline
Severity
P1 - High Priority
- Frequency: Medium - affects data-driven features
- Impact: High - features non-functional (models never populated)
- Detectability: High - obvious when data remains empty
- Prevention cost: Medium - requires understanding data format and generating parsing logic
Category
incomplete-implementation
Related Issues
- Auto-fix skips fingerprint save causing incomplete metadata (sync_orchestration.py:1350) #430 - Missing environment configuration (different incompleteness)
- Add failing tests for issue #392: pdd change KeyError at Step 5 #433 - Handlers not wired up (similar "forgot implementation step")
- Add failing tests for #393: format injection at step 5.5 #435 - Incomplete metric calculations (similar partial implementation)
For Contributors: Discovered when Contact model existed but GitHub issue data was never extracted into it, manual parsing logic added in commit 34a651d5.