Getting Started

This guide will help you set up and run the GEX LLM Patterns validation framework.

Prerequisites

System Requirements

Python: 3.9 or higher
OS: Linux, macOS, or Windows (WSL recommended for Windows)
Memory: 4GB RAM minimum
Storage: 2GB for code + cache

Required Accounts

OpenAI API Key: For GPT-4 LLM calls
- Sign up at: https://platform.openai.com
- Need credits for API usage (~$0.03/validation day with GPT-4)
Options Data Source (Optional):
- Currently uses yfinance (free, limited historical data)
- For production: Consider HistoricalOptionData.com, OptionMetrics, etc.

Installation

Step 1: Clone Repository

git clone https://github.com/iAmGiG/gex-llm-patterns.git
cd gex-llm-patterns

Step 2: Install Dependencies

# Using pip
pip install -r requirements.txt

# Or using conda
conda create -n gex-llm python=3.9
conda activate gex-llm
pip install -r requirements.txt

Key Dependencies:

openai - LLM API client
pandas - Data manipulation
numpy - Numerical computing
yfinance - Options data fetching (free tier)
pyyaml - Validation report generation

Step 3: Set Up Environment Variables

# Set Python path (required for imports)
export PYTHONPATH=$(pwd):$PYTHONPATH

# Set OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"

# Optional: Configure LLM model
export LLM_MODEL="gpt-4o-mini"  # Default: gpt-4o-mini (cheap)
# export LLM_MODEL="gpt-4"      # More accurate but expensive

Tip: Add these to your ~/.bashrc or ~/.zshrc for persistence

Step 4: Verify Installation

# Check imports work
python -c "from src.agents.market_mechanics_agent import MarketMechanicsAgent; print('✅ Imports OK')"

# Check API key configured
python -c "import os; print('✅ API key set' if os.getenv('OPENAI_API_KEY') else '❌ No API key')"

Quick Start: Run a Single Pattern Validation

Option 1: Validate Gamma Positioning (Q1 2024)

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --confidence 60.0

Expected Output:

Processing bar: Processing dates: 100%|████████████| 53/53
Validation report: reports/validation/pattern_taxonomy/gamma_positioning_SPY_2024Q1.yaml
Summary: Detection rate, predictive accuracy, net alpha

Time: ~5-10 minutes for 53 days (with GPT-4o-mini) Cost: ~$1-2 in API calls

Option 2: Validate All Patterns (Batch)

python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --skip-completed

Expected Output:

3 YAML reports (one per pattern)
Summary table comparing detection rates

Time: ~15-30 minutes Cost: ~$3-6 in API calls

Understanding the Output

Validation Report Structure (YAML)

# reports/validation/pattern_taxonomy/gamma_positioning_SPY_2024Q1.yaml

pattern_name: gamma_positioning
symbol: SPY
date_range: 2024-01-02 to 2024-03-29
total_days: 53

# Aggregate Results
detection_rate_pct: 100.0        # LLM detected constraint on 100% of days
predictive_accuracy_pct: 96.2    # 96.2% of predictions materialized
avg_return_pct: 0.26             # Average daily return
net_alpha_pct: 0.21              # Return above risk-free rate
sample_size: 53                  # Number of test days

# Per-Day Results
results:
  - test_date: 2024-01-02
    obfuscated_date: "Day T+0"
    detected: true                # LLM detected constraint
    confidence: 85.0              # LLM confidence (0-100)
    predicted_direction: "UP"     # LLM prediction
    forward_return_t1: 0.45       # Actual T+1 return (%)
    prediction_correct: true      # Did prediction materialize?
    net_gex_usd: -8950000000.0   # -$8.95B (negative gamma)
    spot_price: 474.60

  - test_date: 2024-01-03
    # ... (52 more days)

Interpreting Results

Detection Rate:

100%: LLM detected constraint on every test day
60-80%: Strong detection (pattern is mechanical)
<60%: Weak detection (pattern may be narrative)

Predictive Accuracy:

96%: LLM predictions materialized 96% of time
High accuracy = LLM understands causal mechanism
Low accuracy = pattern detected but doesn't drive price

Net Alpha:

+0.21%: Strategy outperformed risk-free rate by 21 bps/day
Note: Q1 2024 was profitable, but Q3/Q4 declined to near-zero
Detection remains stable despite alpha decline (key finding!)

Common Use Cases

1. Reproduce Paper #1 Results

Run full 2024 validation (Q1, Q3, Q4) for all 3 patterns:

# Q1 2024 (Jan-Mar)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

# Q3 2024 (Jul-Sep)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-07-01 \
  --end-date 2024-09-30

# Q4 2024 (Oct-Dec)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-10-01 \
  --end-date 2024-12-31

Total: 9 validation reports matching Paper #1 results

2. Test New Pattern

Define a new pattern in src/validation/pattern_taxonomy.py:

PATTERNS = {
    # ... existing patterns ...

    "my_new_pattern": {
        "name": "My New Pattern",
        "status": "MECHANICAL",
        "description": "Clear description of constraint",
        "who": "Market participants",
        "whom": "Who is forced?",
        "what": "What are they forced to do?",
        "constraint_mechanism": "Why can't they avoid it?",
        "academic_basis": "Published research citation"
    }
}

Run validation:

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern my_new_pattern \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

3. Test on Different Asset

# Validate gamma positioning on QQQ instead of SPY
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol QQQ \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

Note: Requires options data for that ticker (may need premium data source)

4. Compare Biased vs Unbiased Prompts

# Unbiased (default)
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

# Biased (assumes pattern exists)
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --biased

Compare detection rates (biased should be 100%, unbiased more realistic)

Configuration

LLM Model Selection

Available Models:

gpt-4o-mini: Fast, cheap (~$0.03/day), good accuracy
gpt-4: Slower, expensive (~$0.15/day), highest accuracy
gpt-4-turbo: Balanced performance

How to Switch:

# Via environment variable
export LLM_MODEL="gpt-4"

# Or edit config/config.json
{
  "llm": {
    "model": "gpt-4",
    "temperature": 0.0,
    "max_tokens": 2000
  }
}

Obfuscation Settings

Default: Obfuscation enabled (recommended for research)

Disable (for debugging only):

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --no-obfuscate

Warning: Disabling obfuscation may allow LLM to use temporal context (invalidates methodology)

Cache Settings

Default: Options data cached in .cache/

Clear cache (force fresh data fetch):

rm -rf .cache/options_data_cache.db

Rebuild historical GEX database:

python scripts/data/rebuild_historical_gex.py \
  --symbol SPY \
  --start-date 2024-01-01 \
  --end-date 2024-12-31

Troubleshooting

Error: "No module named 'src'"

Cause: PYTHONPATH not set

Fix:

export PYTHONPATH=$(pwd):$PYTHONPATH

Error: "OpenAI API key not found"

Cause: API key not in environment

Fix:

export OPENAI_API_KEY="sk-your-key-here"

Error: "No options data found for date X"

Cause: yfinance doesn't have data for that date (weekends, holidays, or too old)

Fix:

Use business days only (skip weekends)
Check if date is a market holiday
Consider premium data source for complete history

Slow Performance

Symptoms: Validation takes >1 hour for 50 days

Causes & Fixes:

Using GPT-4 → Switch to gpt-4o-mini (10x faster)
Fresh data fetches → Enable caching (default)
Serial processing → Use batch mode (experimental)

# Faster: Use gpt-4o-mini + ensure caching
export LLM_MODEL="gpt-4o-mini"
python scripts/validation/validate_pattern_taxonomy.py --pattern gamma_positioning ...

High API Costs

Symptoms: Validation costs $10+ for 50 days

Cause: Using expensive model (GPT-4)

Fix:

# Switch to gpt-4o-mini (10x cheaper, similar accuracy)
export LLM_MODEL="gpt-4o-mini"

Cost Comparison (50 days):

GPT-4: ~$7.50
GPT-4o-mini: ~$0.75

Next Steps

Learn More

Methodology - Understand obfuscation testing framework
Pattern Taxonomy - See all validated patterns
Key Results - Detailed Paper #1 findings
API Reference - Code documentation

Run Experiments

Reproduce Paper #1: Validate all 3 patterns across full 2024
Test new patterns: Define and validate your own dealer constraints
Compare assets: Run on QQQ, IWM, or individual stocks

Contribute

Open GitHub Issues for bugs/questions
Review Research Roadmap for future directions
Contact author for collaboration

Support

Issues: https://github.com/iAmGiG/gex-llm-patterns/issues

Documentation: https://github.com/iAmGiG/gex-llm-patterns/tree/development/docs

Contact: See Publications page

Last Updated: October 25, 2025

Getting Started

Getting Started

Prerequisites

System Requirements

Required Accounts

Installation

Step 1: Clone Repository

Step 2: Install Dependencies

Step 3: Set Up Environment Variables

Step 4: Verify Installation

Quick Start: Run a Single Pattern Validation

Option 1: Validate Gamma Positioning (Q1 2024)

Option 2: Validate All Patterns (Batch)

Understanding the Output

Validation Report Structure (YAML)

Interpreting Results

Common Use Cases

1. Reproduce Paper #1 Results

2. Test New Pattern

3. Test on Different Asset

4. Compare Biased vs Unbiased Prompts

Configuration

LLM Model Selection

Obfuscation Settings

Cache Settings

Troubleshooting

Error: "No module named 'src'"

Error: "OpenAI API key not found"

Error: "No options data found for date X"

Slow Performance

High API Costs

Next Steps

Learn More

Run Experiments

Contribute

Support

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally