Validation Pipeline Design Flaw: Only Tests Cached Dates

## Problem

`validate_pattern_taxonomy.py` has a design flaw where `get_test_date_range()` only returns dates that already exist in the cache, never attempting to fetch missing dates. This causes silent incomplete testing.

**Impact**: Q2 2024 validation tested only 17 days (Jun 3-28) instead of full quarter (60 days Apr-Jun), causing incomplete and misleading results.

## Root Cause

**File**: `scripts/validation/validate_pattern_taxonomy.py` lines 89-106

```python
def get_test_date_range(self, start_date: str, end_date: str) -> List[str]:
    """Get all trading days in range from cache."""
    available_dates = []
    for file_path in sorted(cache_base.glob("*.pickle")):
        date_str = file_path.stem
        if start_date <= date_str <= end_date:
            available_dates.append(date_str)
    
    logger.info(f"Found {len(available_dates)} dates in cache")
    return available_dates  # ❌ Only returns cached dates!
```

**Expected Behavior**: When user specifies `--start-date 2024-04-01 --end-date 2024-06-28`, they expect ~60 trading days to be tested.

**Actual Behavior**: Function scans cache, finds only 17 files (Jun 3-28), returns those without warning, and proceeds with incomplete validation.

## Example: Q2 2024 Validation

```bash
# User runs:
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning --symbol SPY \
  --start-date 2024-04-01 --end-date 2024-06-28

# Expected: 60 trading days
# Actual: 17 days (only Jun 3-28)
# Missing: Apr-May 2024 (43 days)
# Warning: NONE
```

**Result**: Incomplete validation report claiming to test "Q2 2024" but actually only testing late June.

## Architectural Decision Needed

**Option A: Auto-Fetch Missing Dates**
- Pros: User-friendly, ensures complete testing
- Cons: Unexpected API calls, slower, may hit rate limits

**Option B: Fail Fast** (RECOMMENDED)
- Pros: Clear error message, separates concerns, predictable
- Cons: Extra step for users
- Implementation: Raise error with list of missing dates + instructions

**Option C: Warn and Continue**
- Pros: Still works with partial data, fast
- Cons: Easy to ignore warnings, misleading results

## Recommended Solution

**Option B: Fail Fast with Clear Error**

```python
def get_test_date_range(self, start_date: str, end_date: str) -> List[str]:
    """Get all trading days, error if any missing."""
    expected = self._get_trading_days(start_date, end_date)
    available = self._scan_cache(start_date, end_date)
    missing = set(expected) - set(available)
    
    if missing:
        raise ValueError(
            f"❌ Missing {len(missing)}/{len(expected)} dates from cache\n"
            f"   First 10 missing: {sorted(missing)[:10]}\n\n"
            f"📥 Collect missing data first:\n"
            f"   python scripts/data_collection/start_historical_collection.py \\\n"
            f"     --symbol {symbol} --start-date {start_date} --end-date {end_date}\n"
        )
    
    return available
```

## Related Issues

- **Issue #79**: Pattern Taxonomy Validation (affected by this bug)
- **Q3 Corruption**: Validation used wrong prices because database missing Q3 dates
- **Q2 Incomplete**: Only tested 17/60 days due to this design flaw

## Documentation

Full postmortem: `docs/guides/validation-data-pipeline-fix.md`

## Priority

**HIGH** - Affects all future multi-quarter validations and can produce misleading results without user awareness.

## Acceptance Criteria

- [ ] Decide on Option A, B, or C
- [ ] Implement chosen solution in `validate_pattern_taxonomy.py`
- [ ] Add test to verify behavior with missing dates
- [ ] Document workflow in validation guide
- [ ] Update CLAUDE.md with new validation requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validation Pipeline Design Flaw: Only Tests Cached Dates #84

Problem

Root Cause

Example: Q2 2024 Validation

Architectural Decision Needed

Recommended Solution

Related Issues

Documentation

Priority

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Validation Pipeline Design Flaw: Only Tests Cached Dates #84

Description

Problem

Root Cause

Example: Q2 2024 Validation

Architectural Decision Needed

Recommended Solution

Related Issues

Documentation

Priority

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions