Skip to content

Getting Started

WormsCanned edited this page Oct 26, 2025 · 1 revision

Getting Started

This guide will help you set up and run the GEX LLM Patterns validation framework.


Prerequisites

System Requirements

  • Python: 3.9 or higher
  • OS: Linux, macOS, or Windows (WSL recommended for Windows)
  • Memory: 4GB RAM minimum
  • Storage: 2GB for code + cache

Required Accounts

  1. OpenAI API Key: For GPT-4 LLM calls

  2. Options Data Source (Optional):

    • Currently uses yfinance (free, limited historical data)
    • For production: Consider HistoricalOptionData.com, OptionMetrics, etc.

Installation

Step 1: Clone Repository

git clone https://github.com/iAmGiG/gex-llm-patterns.git
cd gex-llm-patterns

Step 2: Install Dependencies

# Using pip
pip install -r requirements.txt

# Or using conda
conda create -n gex-llm python=3.9
conda activate gex-llm
pip install -r requirements.txt

Key Dependencies:

  • openai - LLM API client
  • pandas - Data manipulation
  • numpy - Numerical computing
  • yfinance - Options data fetching (free tier)
  • pyyaml - Validation report generation

Step 3: Set Up Environment Variables

# Set Python path (required for imports)
export PYTHONPATH=$(pwd):$PYTHONPATH

# Set OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"

# Optional: Configure LLM model
export LLM_MODEL="gpt-4o-mini"  # Default: gpt-4o-mini (cheap)
# export LLM_MODEL="gpt-4"      # More accurate but expensive

Tip: Add these to your ~/.bashrc or ~/.zshrc for persistence

Step 4: Verify Installation

# Check imports work
python -c "from src.agents.market_mechanics_agent import MarketMechanicsAgent; print('✅ Imports OK')"

# Check API key configured
python -c "import os; print('✅ API key set' if os.getenv('OPENAI_API_KEY') else '❌ No API key')"

Quick Start: Run a Single Pattern Validation

Option 1: Validate Gamma Positioning (Q1 2024)

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --confidence 60.0

Expected Output:

  • Processing bar: Processing dates: 100%|████████████| 53/53
  • Validation report: reports/validation/pattern_taxonomy/gamma_positioning_SPY_2024Q1.yaml
  • Summary: Detection rate, predictive accuracy, net alpha

Time: ~5-10 minutes for 53 days (with GPT-4o-mini) Cost: ~$1-2 in API calls

Option 2: Validate All Patterns (Batch)

python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --skip-completed

Expected Output:

  • 3 YAML reports (one per pattern)
  • Summary table comparing detection rates

Time: ~15-30 minutes Cost: ~$3-6 in API calls


Understanding the Output

Validation Report Structure (YAML)

# reports/validation/pattern_taxonomy/gamma_positioning_SPY_2024Q1.yaml

pattern_name: gamma_positioning
symbol: SPY
date_range: 2024-01-02 to 2024-03-29
total_days: 53

# Aggregate Results
detection_rate_pct: 100.0        # LLM detected constraint on 100% of days
predictive_accuracy_pct: 96.2    # 96.2% of predictions materialized
avg_return_pct: 0.26             # Average daily return
net_alpha_pct: 0.21              # Return above risk-free rate
sample_size: 53                  # Number of test days

# Per-Day Results
results:
  - test_date: 2024-01-02
    obfuscated_date: "Day T+0"
    detected: true                # LLM detected constraint
    confidence: 85.0              # LLM confidence (0-100)
    predicted_direction: "UP"     # LLM prediction
    forward_return_t1: 0.45       # Actual T+1 return (%)
    prediction_correct: true      # Did prediction materialize?
    net_gex_usd: -8950000000.0   # -$8.95B (negative gamma)
    spot_price: 474.60

  - test_date: 2024-01-03
    # ... (52 more days)

Interpreting Results

Detection Rate:

  • 100%: LLM detected constraint on every test day
  • 60-80%: Strong detection (pattern is mechanical)
  • <60%: Weak detection (pattern may be narrative)

Predictive Accuracy:

  • 96%: LLM predictions materialized 96% of time
  • High accuracy = LLM understands causal mechanism
  • Low accuracy = pattern detected but doesn't drive price

Net Alpha:

  • +0.21%: Strategy outperformed risk-free rate by 21 bps/day
  • Note: Q1 2024 was profitable, but Q3/Q4 declined to near-zero
  • Detection remains stable despite alpha decline (key finding!)

Common Use Cases

1. Reproduce Paper #1 Results

Run full 2024 validation (Q1, Q3, Q4) for all 3 patterns:

# Q1 2024 (Jan-Mar)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

# Q3 2024 (Jul-Sep)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-07-01 \
  --end-date 2024-09-30

# Q4 2024 (Oct-Dec)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-10-01 \
  --end-date 2024-12-31

Total: 9 validation reports matching Paper #1 results

2. Test New Pattern

Define a new pattern in src/validation/pattern_taxonomy.py:

PATTERNS = {
    # ... existing patterns ...

    "my_new_pattern": {
        "name": "My New Pattern",
        "status": "MECHANICAL",
        "description": "Clear description of constraint",
        "who": "Market participants",
        "whom": "Who is forced?",
        "what": "What are they forced to do?",
        "constraint_mechanism": "Why can't they avoid it?",
        "academic_basis": "Published research citation"
    }
}

Run validation:

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern my_new_pattern \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

3. Test on Different Asset

# Validate gamma positioning on QQQ instead of SPY
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol QQQ \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

Note: Requires options data for that ticker (may need premium data source)

4. Compare Biased vs Unbiased Prompts

# Unbiased (default)
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

# Biased (assumes pattern exists)
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --biased

Compare detection rates (biased should be 100%, unbiased more realistic)


Configuration

LLM Model Selection

Available Models:

  • gpt-4o-mini: Fast, cheap (~$0.03/day), good accuracy
  • gpt-4: Slower, expensive (~$0.15/day), highest accuracy
  • gpt-4-turbo: Balanced performance

How to Switch:

# Via environment variable
export LLM_MODEL="gpt-4"

# Or edit config/config.json
{
  "llm": {
    "model": "gpt-4",
    "temperature": 0.0,
    "max_tokens": 2000
  }
}

Obfuscation Settings

Default: Obfuscation enabled (recommended for research)

Disable (for debugging only):

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --no-obfuscate

Warning: Disabling obfuscation may allow LLM to use temporal context (invalidates methodology)

Cache Settings

Default: Options data cached in .cache/

Clear cache (force fresh data fetch):

rm -rf .cache/options_data_cache.db

Rebuild historical GEX database:

python scripts/data/rebuild_historical_gex.py \
  --symbol SPY \
  --start-date 2024-01-01 \
  --end-date 2024-12-31

Troubleshooting

Error: "No module named 'src'"

Cause: PYTHONPATH not set

Fix:

export PYTHONPATH=$(pwd):$PYTHONPATH

Error: "OpenAI API key not found"

Cause: API key not in environment

Fix:

export OPENAI_API_KEY="sk-your-key-here"

Error: "No options data found for date X"

Cause: yfinance doesn't have data for that date (weekends, holidays, or too old)

Fix:

  • Use business days only (skip weekends)
  • Check if date is a market holiday
  • Consider premium data source for complete history

Slow Performance

Symptoms: Validation takes >1 hour for 50 days

Causes & Fixes:

  1. Using GPT-4 → Switch to gpt-4o-mini (10x faster)
  2. Fresh data fetches → Enable caching (default)
  3. Serial processing → Use batch mode (experimental)
# Faster: Use gpt-4o-mini + ensure caching
export LLM_MODEL="gpt-4o-mini"
python scripts/validation/validate_pattern_taxonomy.py --pattern gamma_positioning ...

High API Costs

Symptoms: Validation costs $10+ for 50 days

Cause: Using expensive model (GPT-4)

Fix:

# Switch to gpt-4o-mini (10x cheaper, similar accuracy)
export LLM_MODEL="gpt-4o-mini"

Cost Comparison (50 days):

  • GPT-4: ~$7.50
  • GPT-4o-mini: ~$0.75

Next Steps

Learn More

Run Experiments

  • Reproduce Paper #1: Validate all 3 patterns across full 2024
  • Test new patterns: Define and validate your own dealer constraints
  • Compare assets: Run on QQQ, IWM, or individual stocks

Contribute


Support

Issues: https://github.com/iAmGiG/gex-llm-patterns/issues

Documentation: https://github.com/iAmGiG/gex-llm-patterns/tree/development/docs

Contact: See Publications page


Last Updated: October 25, 2025