PDD-CLI Bug: Generates Data Models Without Extraction/Parsing Logic

## PDD-CLI Bug: Generates Data Models Without Extraction/Parsing Logic

PDD-CLI creates data model classes with field definitions, but doesn't implement the logic to extract and populate those fields from actual data sources. Models exist but remain empty because data isn't parsed.

**Why this matters:** Generated models are unusable - all fields remain `None` or default values because nothing populates them from source data.

## Concrete Example

For a contact management system:

```python
# PDD generated model (INCOMPLETE):
# models/contact.py
from pydantic import BaseModel

class Contact(BaseModel):
    email: str
    name: str
    company: Optional[str] = None
    labels: List[str] = []
```

But no extraction logic:

```python
# handlers/create_contact.py
def create_contact(issue_body: str) -> Contact:
    # PDD generated this - but how to extract fields?
    contact = Contact(email="???", name="???")  # ← No parsing implemented!
    return contact
```

**What went wrong:** PDD defined the model structure but didn't implement parsing logic to extract `email`, `name`, `company` from the GitHub issue body format.

**Impact:** All contacts created with placeholder data, actual data from issues never extracted.

## Why PDD Makes This Mistake

PDD-CLI currently:
- Generates data structures (models) separately from data pipelines (extraction)
- Defines "what" without implementing "how"
- Assumes extraction logic will be added later

But it should:
1. Generate complete data pipeline: parse → validate → transform → store
2. Implement extraction logic for defined fields
3. Handle parsing failures gracefully

## How to Prevent This in PDD-CLI

**What PDD should do differently:**

1. **Generate complete data pipeline:**
   ```python
   def parse_contact_from_issue(issue_body: str) -> Contact:
       """Extract contact fields from GitHub issue body."""
       import re
       
       # Extract email
       email_match = re.search(r'Email:\s*(\S+@\S+)', issue_body)
       email = email_match.group(1) if email_match else None
       
       # Extract name
       name_match = re.search(r'Name:\s*(.+)', issue_body)
       name = name_match.group(1).strip() if name_match else None
       
       # Extract company
       company_match = re.search(r'Company:\s*(.+)', issue_body)
       company = company_match.group(1).strip() if company_match else None
       
       if not email or not name:
           raise ValueError("Missing required fields")
       
       return Contact(email=email, name=name, company=company)
   ```

2. **Generate validation and error handling:** Handle malformed input gracefully.

3. **Generate tests for extraction:** Ensure parsing works correctly.

**Example improvement:**
```
Current: "Create Contact model"
       → Generate Contact class
       → No extraction logic
       → Fields never populated

Improved: "Create Contact model"
        → Generate Contact class
        → Generate parse_contact_from_issue()
        → Generate validation logic
        → Generate tests with sample data
        → Complete, working pipeline
```

## Severity

**P1 - High Priority**

- **Frequency:** Medium - affects data-driven features
- **Impact:** High - features non-functional (models never populated)
- **Detectability:** High - obvious when data remains empty
- **Prevention cost:** Medium - requires understanding data format and generating parsing logic

## Category

`incomplete-implementation`

## Related Issues

- #430 - Missing environment configuration (different incompleteness)
- #433 - Handlers not wired up (similar "forgot implementation step")
- #435 - Incomplete metric calculations (similar partial implementation)

---

**For Contributors:** Discovered when Contact model existed but GitHub issue data was never extracted into it, manual parsing logic added in commit `34a651d5`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDD-CLI Bug: Generates Data Models Without Extraction/Parsing Logic #585