Skip to content

fix: Ticker symbol normalization and data availability detection #471

@iAmGiG

Description

@iAmGiG

Problem

Ticker $OPEN shows "Not enough market data available" error in CLI, despite:

  • ✅ Data stored in users.db with voter fields populated
  • ✅ Data stored in trading_data.db with actual bars
  • ❌ Analysis retrieval fails to find/use this data

This suggests a data retrieval mismatch rather than missing data.

Root Cause Analysis

The issue is NOT about special characters - it's about:

  1. Storage vs Retrieval Key Mismatch: Data stored with one key, retrieved with another
  2. Overly Strict Data Requirements: Minimum bar count threshold too high
  3. Poor Error Detection: "Not enough data" is ambiguous (no data vs. insufficient bars)

Better Solution: Systematic Approach

1. Use Alpaca's Ticker Format as Source of Truth

Don't normalize - use exactly what Alpaca returns:

# WRONG: Normalize/strip characters
ticker = ticker.strip('$').upper()  # Creates inconsistency

# RIGHT: Use Alpaca's format
ticker = alpaca_asset.symbol  # Whatever Alpaca calls it

2. Improve Data Availability Detection

def check_data_availability(ticker: str) -> DataAvailability:
    """Check if ticker has usable data."""
    cached_data = db.get_cached_bars(ticker)
    
    return DataAvailability(
        exists=len(cached_data) > 0,
        bar_count=len(cached_data),
        sufficient=len(cached_data) >= required_bars,
        last_update=cached_data[-1].timestamp if cached_data else None,
        reason=_get_reason(cached_data)  # Specific failure reason
    )

3. Better Error Messages

Instead of generic "Not enough data":

  • "No data found for OPEN (check ticker symbol)"
  • "Only 15 bars available, need 50 for MACD analysis"
  • "Data last updated 2 days ago, may be stale"

4. Adaptive Requirements

# Don't hardcode min bars - adapt to what's available
if bars < 50:
    return "Insufficient data for full analysis. Try shorter timeframe."
elif bars < 20:
    return "Very limited data. Ticker may be illiquid or newly listed."

Investigation Steps

  1. Trace data flow for $OPEN:

    # Check what's stored
    sqlite3 users.db "SELECT * FROM tickers WHERE symbol LIKE '%OPEN%'"
    sqlite3 trading_data.db "SELECT COUNT(*), MIN(timestamp), MAX(timestamp) FROM bars WHERE symbol='OPEN'"
    
    # Check what analysis tries to fetch
    # Add debug logging to see retrieval key
  2. Compare keys:

    • Storage key in DB
    • Retrieval key in analysis
    • Identify mismatch
  3. Fix the mismatch (not the symptoms)

Files to Investigate

  • src/data_sources/sources/market/*.py - How ticker is stored
  • src/trading/instruments/indicators.py - How ticker is retrieved
  • src/data_sources/cache/*.py - Cache key generation

Success Criteria

  • $OPEN analysis works with existing cached data
  • Error messages specify exact problem (no data vs insufficient bars)
  • No special-case handling for specific characters
  • System uses Alpaca's ticker format consistently

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdataData integration and management

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions