-
Notifications
You must be signed in to change notification settings - Fork 53
Description
PDD-CLI Bug: Generates Test Data Without Proper CSV Escaping
PDD-CLI generates CSV test data with unquoted fields containing commas, breaking CSV parsing. When test data includes comma-separated values (like tags or labels), PDD doesn't quote the fields.
Why this matters: Test data fails to parse, causing CSV parsing errors and test failures.
Concrete Example
For a test that creates GitHub issues with labels:
# PDD generated test data (WRONG):
# test_data.csv
email,name,labels
user1@example.com,John Doe,attendee,vip
user2@example.com,Jane Smith,speaker,sponsorCSV parser reads this as:
# Row 1 has 5 fields instead of 3!
['user1@example.com', 'John Doe', 'attendee', 'vip'] # ← Extra fields!Correct format:
# Should generate (CORRECT):
# test_data.csv
email,name,labels
user1@example.com,John Doe,"attendee,vip"
user2@example.com,Jane Smith,"speaker,sponsor"What went wrong: PDD generated labels as attendee,vip without quotes. The CSV parser treats the comma as a field delimiter, splitting into 5 fields instead of 3.
Impact: csv.DictReader throws error or creates malformed records with extra fields.
Why PDD Makes This Mistake
PDD-CLI currently:
- Generates CSV as plain text
- Doesn't quote fields containing special characters
- Doesn't use proper CSV writing libraries
But it should:
- Use
csv.DictWriteror equivalent to handle escaping - Always quote fields containing commas, quotes, or newlines
- Follow RFC 4180 CSV spec
How to Prevent This in PDD-CLI
What PDD should do differently:
-
Use CSV libraries for generation:
import csv with open('test_data.csv', 'w', newline='') as f: writer = csv.DictWriter(f, fieldnames=['email', 'name', 'labels']) writer.writeheader() writer.writerow({ 'email': 'user1@example.com', 'name': 'John Doe', 'labels': 'attendee,vip' # Library handles quoting })
-
Manual generation - always quote fields with commas:
email,name,labels user1@example.com,John Doe,"attendee,vip" -
Validate generated CSV: Parse it back to ensure it works.
Example improvement:
Current: Generate CSV as string concatenation
→ labels = "attendee,vip" (no quotes)
→ CSV broken (4 fields instead of 3)
Improved: Generate CSV using csv.DictWriter
→ Automatic quoting for fields with commas
→ Valid CSV produced
Severity
P2 - Medium Priority
- Frequency: Low - only affects CSV test data generation
- Impact: Test data parsing failures
- Detectability: High - immediate CSV parsing errors
- Prevention cost: Low - use CSV libraries
Category
test-environment
Related Issues
- Add failing tests for #419: Unpushed commits in early exit #422 - Module-level imports (different test environment issue)
- Architecture generation: Missing public/ directory for Next.js frontend causes Docker build failure #423 - Async data loading waits (different test issue)
For Contributors: Discovered in backend/tests/test_crm_github.py where GitHub issue labels CSV was malformed, fixed in commit 34a651d5.