Skip to content

RoadMap

WormsCanned edited this page Oct 26, 2025 · 2 revisions

Research Roadmap: Multi-Paper PhD Trajectory

Status: Paper #1 submitted (Oct 26, 2025), Papers #2-3 planned


Overview

This roadmap outlines the multi-paper research trajectory for validating LLM understanding of market microstructure constraints through obfuscation testing.

Core Methodology: Obfuscation testing framework (strip temporal context, force reasoning from structure)

Test Domain: Options market dealer constraints (gamma exposure hedging)

Key Innovation: Rigorous validation that distinguishes understanding from memorization


Paper Timeline

Paper Status Timeline Contribution
Paper #1 ✅ Submitted Oct 2025 Baseline obfuscation methodology (single-day, SPY)
Paper #2 📋 Planned Q1 2026 Temporal dynamics (sequential GEX analysis)
Paper #3 📋 Planned Q2 2026 Cross-asset generalization (individual equities)
Paper #4+ 💭 Future 2026+ Pattern discovery, comparative LLMs, hybrid systems

Paper #1: Obfuscation Testing Baseline ✅

Status: Submitted October 26, 2025

Title: "Validating Large Language Model Understanding of Market Microstructure Through Obfuscation Testing"

Target: LLM-Finance 2025 Workshop @ IEEE BigData 2025

Contribution:

  • Novel obfuscation testing framework for LLM validation
  • Proof that LLMs can detect structural dealer constraints without temporal context
  • Multi-pattern validation (3 patterns, 242 days, 726 tests)

Key Results:

  • Detection: 71.5% average (unbiased prompts across 3 patterns)
  • Accuracy: 91.2% (predictions materialize)
  • Validation: Full 2024 (242 trading days per pattern)

Critical Finding: Detection-Profitability Divergence

  • LLM detection stable (84-100%) while profitability → 0% (Q1→Q4 2024)
  • Proves detection based on structure, not profit optimization
  • Validates methodology prevents temporal context leakage

Documentation: Paper #1 Full Content

GitHub Issues: #88 (Status Tracking)


Paper #2: Sequential GEX Analysis 📋

Status: Planned - Q1 2026

Title (Proposed): "Temporal Dynamics of Dealer Constraints: Sequential Gamma Exposure Analysis with LLMs"

Target: Journal submission (6-8 pages)

Motivation (Advisor Input)

"Currently you are looking on single day gamma exposure, will it be worthy look at most recent 5 days to detect the hidden force? I mean the sequential changes of gamma exposure would bring more info on dealers intention. This could be a next more comprehensive paper even before going to individual stocks"

Research Questions

  1. Can LLMs detect constraint trajectories (not just snapshots)?
  2. Does sequential context improve predictive accuracy over single-day?
  3. What temporal patterns emerge in dealer hedging behavior?

Methodology

5-Day Lookback Windows:

  • Day T-4 to Day T+0 (sequential context)
  • Maintain obfuscation: "Day T-4", "Day T-3", etc. (no real dates)
  • New pattern taxonomy: Accumulation, relief, reversal, persistence

Example Sequential Prompt:

Sequential GEX Data (Day T-4 to Day T+0):

Day T-4: Net GEX: -$2.1B (negative gamma)
Day T-3: Net GEX: -$3.2B (negative gamma INCREASING)
Day T-2: Net GEX: -$4.1B (negative gamma INCREASING)
Day T-1: Net GEX: -$4.8B (negative gamma INCREASING)
Day T+0: Net GEX: -$5.2B (negative gamma PEAK)

Trajectory: Escalating short gamma over 5 days (-$2.1B → -$5.2B)

WHO is forcing WHOM to do WHAT?
Consider the TRAJECTORY of constraints, not just current state.

Expected Contributions

  1. Temporal extension of obfuscation testing framework
  2. Evidence that LLMs reason about constraint trajectories
  3. Pattern taxonomy expansion (4 new temporal pattern types)
  4. Predictive accuracy improvement via sequential context

Expected Outcomes

  • Best case: Accuracy improves 91% → 96% (sequential adds value)
  • Neutral: Similar accuracy (single-day sufficient)
  • Worst case: Accuracy decreases (sequential confuses LLM)

Implementation Plan

  1. Phase 1 (Day 1): Database query extension (5-day windows)
  2. Phase 2 (Day 2): Sequential prompt template
  3. Phase 3 (Days 3-4): Validation runs (169 5-day windows on SPY 2024)
  4. Phase 4 (Day 5): Comparative analysis (single vs sequential)

Dataset: SPY 2024 (existing data, no new collection needed)

Estimated Effort: 5 days implementation + 2-3 weeks analysis/writing

GitHub Issue: #89 (Sequential GEX Analysis)

Dependency: Paper #1 acceptance/publication


Paper #3: Cross-Asset Generalization 📋

Status: Planned - Q2 2026

Title (Proposed): "Cross-Asset Validation of LLM Market Microstructure Understanding"

Target: Journal submission (8-10 pages)

Research Questions

  1. Does obfuscation testing generalize beyond SPY index options?
  2. Do dealer constraints differ between index and single-name options?
  3. Can LLMs detect stock-specific vs market-wide patterns?

Methodology

Test on 10-20 Individual Stocks:

  • High liquidity: AAPL, MSFT, NVDA, TSLA, etc.
  • Use sequential analysis if Paper #2 validates it
  • Compare dealer dynamics: Index (SPY) vs single-name

Key Differences (Index vs Single-Name):

  • Index options: Broader dealer base, market-making focus
  • Single-name options: Concentrated positions, hedging focus
  • Gamma dynamics: SPY has constant 0DTE volume, stocks vary
  • Liquidity: SPY ultra-liquid, individual stocks more fragmented

Expected Contributions

  1. Full generalization proof (methodology works beyond single asset)
  2. Cross-asset comparison (index vs single-name dealer dynamics)
  3. Pattern persistence analysis (universal vs asset-specific constraints)
  4. Combined temporal + cross-asset validation (if Paper #2 successful)

Dataset Requirements

  • Individual stock options data (2024)
  • ~10-20 stocks × 242 days = ~2,420-4,840 tests
  • Higher data collection effort than Paper #2

Estimated Effort:

  • 1-2 weeks data collection (individual stocks)
  • 1 week validation runs
  • 2-3 weeks analysis/writing

GitHub Issue: #6 (Cross-asset validation) - relates to Paper #3

Dependencies:

  • Paper #1 acceptance
  • Paper #2 submission (determine if sequential method validated)

Paper #4+ Candidates (Long-Term)

1. Pattern Discovery (18-24 months)

Research Question: Can LLMs discover novel patterns (not just validate known ones)?

Methodology:

  • Unsupervised pattern mining with LLMs
  • Move from validation → discovery
  • Different evaluation framework (data mining risks)

Challenges:

  • Requires different validation methodology
  • Higher risk of false positives
  • Need expert validation for novel patterns

Status: Deferred to Paper #4+ (fundamentally different problem class)

2. Comparative LLM Analysis (12-18 months)

Research Question: How do different LLM architectures perform on constraint detection?

Methodology:

  • Test multiple LLMs: GPT-4, o3-mini, Claude, open-source models
  • Reasoning capabilities comparison
  • Structured output quality assessment

Key Comparison: Reasoning models (o3-mini) vs standard models (GPT-4)

  • Hypothesis: Explicit reasoning improves causal identification

Status: Medium-term (requires o3-mini availability)

3. Confidence Calibration Study

Research Question: Are LLM confidence scores well-calibrated to empirical accuracy?

Methodology:

  • Compare stated confidence to prediction materialization rates
  • Develop post-processing calibration adjustments if needed
  • Test across sequential and cross-asset contexts

Status: Analysis component (fold into Paper #2 or #3, not standalone)

4. Hybrid Formal Methods

Research Question: Can we combine formal verification + LLM reasoning?

Methodology:

  • Formal methods: Prove constraint properties mathematically
  • LLM reasoning: Assess practical materialization from context
  • Complementary strengths → robust validation

Status: Long-term vision (2026+)

5. Real-Time Applications

Research Question: Can obfuscation-validated LLMs monitor markets in real-time?

Application:

  • Automated constraint detection
  • Explainable alerts (WHO→WHOM→WHAT)
  • Regulatory reporting (market structure surveillance)

Status: Long-term (requires production infrastructure)


Superseded Ideas

Alpha Decline Investigation (Oct 13, 2025)

Original proposal: Explain why profitability declined Q1→Q4 2024 despite stable detection

Status: SUPERSEDED - Fold into Paper #2 discussion section

Rationale: Interesting but not core methodology contribution. Sequential analysis may naturally explain regime changes.

Pattern Discovery as Paper #3 (Oct 22, 2025)

Original proposal: Paper #3 focused on unsupervised pattern mining

Status: DEFERRED to Paper #4+

Rationale: Advisor sequence ("before going to individual stocks") prioritizes cross-asset generalization. Pattern discovery is fundamentally different problem requiring different validation framework.


Decision Points

After Paper #1 Acceptance

Decision: Proceed with Paper #2 (Sequential GEX) implementation

  • Timeline: Start immediately after acceptance notification
  • Effort: 5 days implementation + 2-3 weeks writing
  • Risk: Low (uses existing data)

After Sequential Validation (Paper #2)

Decision 1: Include sequential in Paper #2 or defer?

  • If accuracy improves: Paper #2 focuses on sequential methodology
  • If neutral/worse: Fold into Paper #1 discussion, proceed to Paper #3 without sequential

Decision 2: Timeline for Paper #3

  • If Paper #2 quick: Start Paper #3 data collection in parallel with Paper #2 writing
  • If Paper #2 delayed: Sequential start (finish Paper #2, then start Paper #3)

Paper #4+ Direction

Decision: After Papers #2-3 complete

Assess which long-term direction has most impact:

  • Pattern discovery (high risk, high reward)
  • Comparative LLMs (medium risk, clear contribution)
  • Hybrid systems (long-term vision)
  • Real-time applications (practical impact)

Publication Strategy

Venues

Paper #1 (Workshop):

  • LLM-Finance 2025 Workshop @ IEEE BigData 2025
  • Deadline: October 26, 2025 ✅
  • Format: 4-6 pages workshop paper

Paper #2 (Journal):

  • Target: Journal of Financial Markets, Journal of Finance, or similar
  • Format: 6-8 pages journal article
  • Timeline: Q1 2026 submission

Paper #3 (Journal):

  • Target: Same tier as Paper #2
  • Format: 8-10 pages (larger scope with cross-asset)
  • Timeline: Q2 2026 submission

Paper #4+ (Journal/Conference):

  • Depends on direction chosen
  • Timeline: 2026+

Conference Presentations

Consider presenting at:

  • AFA (American Finance Association)
  • WFA (Western Finance Association)
  • MFA (Midwest Finance Association)
  • NeurIPS (ML track)
  • ICML (Finance + ML)

Key Principles

Throughout all papers, maintain:

  1. Obfuscation rigor: Always strip temporal context
  2. WHO→WHOM→WHAT: Explicit causal identification
  3. Academic honesty: Report failures and limitations
  4. Reproducibility: All code/data documented
  5. Generalization: Prove methodology scales beyond cherry-picked examples

Timeline Summary

Date Milestone
✅ Oct 26, 2025 Paper #1 submitted
Nov-Dec 2025 Paper #1 review period
Jan 2026 Start Paper #2 (sequential GEX)
Q1 2026 Paper #2 submission
Q2 2026 Paper #3 submission (cross-asset)
2026+ Paper #4+ (discovery/comparative/hybrid)

Key Dependency: Paper #1 acceptance gates Paper #2 timeline. If acceptance delayed, adjust subsequent timelines accordingly.


Full Details: See docs/papers/research_roadmap.md in repository

Last Updated: October 25, 2025

Clone this wiki locally