OPTEEE - Options Trading Education Expert

title

emoji

colorFrom

colorTo

sdk

app_port

pinned

env

opteee

🔥

blue

red

docker

7860

false

PYTHONPATH=/app

OPTEEE - Options Trading Education Expert

A powerful semantic search application providing intelligent Q&A across a curated collection of options trading educational content. Built with modern technologies for fast, accurate, and context-aware responses.

Overview

OPTEEE uses advanced natural language processing and vector similarity search to help traders learn from a comprehensive knowledge base of options trading transcripts and educational videos. Ask questions in plain English and get detailed answers with direct links to relevant source material.

Features

Semantic Search: Advanced NLP-powered search that understands meaning, not just keywords
Fast Retrieval: FAISS vector database delivers millisecond search responses
Multi-Source Knowledge Base: Combines video transcripts and academic research papers
Video Integration: Direct links to specific timestamps in source YouTube videos
Research Paper Support: Academic papers with page references and section context
Chat Interface: Modern, responsive chat UI with conversation history
Source Citations: Every answer includes clickable references with timestamps or page numbers
Context-Aware: Maintains conversation history for follow-up questions
Responsive Design: Works seamlessly on desktop and mobile devices

Knowledge Base

OPTEEE draws from two primary sources:

Source Type	Content	Count
Video Transcripts	Options trading tutorials, strategy explanations, market analysis	17,200+ chunks
Research Papers	Academic papers on PEAD, volatility, retail trading behavior	8,900+ chunks

Total: 26,100+ searchable knowledge chunks

Architecture

Backend: FastAPI with RESTful API endpoints
Frontend: React with modern UI components
Search Engine: Sentence-transformers with FAISS vector database
NLP Model: all-MiniLM-L6-v2 for semantic embeddings
Deployment: Docker containerization for easy deployment

Quick Start

Prerequisites

Python 3.9 or higher
Docker (optional, for containerized deployment)
Git

Local Development Setup

Clone the repository:

git clone https://github.com/yourusername/opteee.git
cd opteee

Install dependencies:

pip install -r requirements.txt

Run the development server:

python main.py

The application will be available at http://localhost:7860

Docker Deployment

Build and run with Docker:

# Build the Docker image
docker build -t opteee .

# Run the container
docker run -p 7860:7860 opteee

Or use Docker Compose:

docker-compose up

API Documentation

Endpoints

GET /api/health - Health check endpoint
- Returns service status and version information

POST /api/chat - Main chat endpoint

Request body:

{
  "query": "What is a covered call?",
  "provider": "huggingface",
  "num_results": 5,
  "format": "detailed",
  "conversation_history": []
}

Returns answer with sources and timestamps

GET / - Serves the React frontend application

Project Structure

opteee/
├── main.py                      # FastAPI application entry point
├── config.py                    # Configuration and settings
├── rag_pipeline.py              # RAG implementation
├── vector_search.py             # Vector similarity search
├── create_vector_store.py       # Vector store creation (transcripts + PDFs)
├── rebuild_vector_store.py      # Vector store rebuilding
├── process_pdfs.py              # PDF semantic chunking utility
├── app/
│   ├── models/                  # Pydantic models
│   │   └── chat_models.py       # Chat request/response models (supports video + PDF)
│   └── services/                # Business logic services
│       ├── rag_service.py       # RAG service implementation
│       └── formatters.py        # Response formatting (HTML + Discord)
├── frontend/
│   └── build/                   # React production build
├── vector_store/                # FAISS vector database files
├── processed_transcripts/       # Processed video transcript chunks (JSON)
├── processed_pdfs/              # Processed PDF document chunks (JSON)
├── transcripts/                 # Raw transcript data
├── static/                      # Static assets (CSS, JS)
├── templates/                   # HTML templates
├── discord/                     # Discord bot integration
│   ├── discord_bot.py           # Discord bot implementation
│   └── ...                      # Bot configuration files
├── docs/                        # Documentation
├── archive/                     # Archived utilities and scripts
├── Dockerfile                   # Docker configuration
├── docker-compose.yml           # Docker Compose configuration
└── requirements.txt             # Python dependencies

Key Technologies

FastAPI - High-performance Python web framework
React - Modern frontend JavaScript library
Sentence Transformers - State-of-the-art sentence embeddings
FAISS - Efficient similarity search and clustering
Docker - Containerization platform
HuggingFace - Model hosting and deployment

Development Workflow

Backend Changes: Modify FastAPI endpoints in main.py or services in app/services/
Frontend Changes: Update React components in frontend/src/ (requires separate build)
Testing: Run locally with python main.py
Vector Store Updates: Rebuild with python rebuild_vector_store.py
Deploy: Build and push Docker image

Configuration

Key configuration options in config.py:

MODEL_NAME: Sentence transformer model (default: "all-MiniLM-L6-v2")
TOP_K: Number of top results to retrieve (default: 5)
CHUNK_SIZE: Size of text chunks for processing (default: 500)
CHUNK_OVERLAP: Overlap between chunks (default: 50)

Updating Knowledgebase

OPTEEE uses an automated GitHub Actions workflow to keep the knowledge base up-to-date with the latest educational content. The system automatically discovers new videos, generates transcripts, and deploys updates.

Automated Weekly Updates

The knowledge base is automatically updated every Sunday at 8:00 PM UTC (3:00 PM CT) through the Process Video Transcripts Weekly workflow:

What happens automatically:

Video Discovery - Scans YouTube channels for new educational content
Transcript Generation - Creates text transcripts from videos using YouTube API and Whisper
Text Processing - Chunks transcripts into searchable segments (250 words with 50-word overlap)
Repository Update - Commits new transcripts and processed data to the repository
Deployment Trigger - Automatically triggers HuggingFace Space deployment
Vector Store Rebuild - HuggingFace rebuilds the FAISS vector database during Docker build

Processing Pipeline:

GitHub Actions:                           HuggingFace Spaces:
┌─────────────────────┐                  ┌──────────────────────┐
│ 1. Video Discovery  │                  │ 5. Docker Build      │
│ 2. Transcripts      │  ───(push)───>   │ 6. Vector Store      │
│ 3. Text Processing  │                  │ 7. Deploy App        │
│ 4. Commit & Push    │                  └──────────────────────┘
└─────────────────────┘

Manual Workflow Triggering

You can manually trigger the knowledge base update at any time:

Via GitHub Web Interface:

Navigate to the Actions tab in the GitHub repository
Select "Process Video Transcripts Weekly" workflow
Click "Run workflow" button
Choose the branch (usually main)
Click "Run workflow" to start

Via GitHub CLI:

gh workflow run "Process Video Transcripts Weekly"

Local Knowledge Base Rebuild

To rebuild the vector store locally (for development or testing):

# Rebuild the entire vector store from processed transcripts
python rebuild_vector_store.py

# Or use the create script directly
python create_vector_store.py

Note: The vector store files (vector_store/) are large and should not be committed to the repository. They are rebuilt automatically during deployment.

Workflow Configuration

The automated pipeline is configured in .github/workflows/process-transcripts.yml:

Key Settings:

Schedule: Weekly on Sunday at 20:00 UTC
Timeout: 180 minutes (3 hours) for large processing jobs
Python Version: 3.10
Dependencies: FFmpeg (for audio processing), PyTorch, Sentence-Transformers

Required Secrets:

YOUTUBE_API_KEY - For accessing YouTube API to fetch video metadata and transcripts
HF_TOKEN - For deploying to HuggingFace Spaces

Deployment Pipeline

After transcripts are processed, the Deploy to Hugging Face Space workflow automatically:

Triggers On:
- Push to main branch
- Manual trigger via workflow dispatch
- Automatic trigger after transcript updates
Deployment Steps:
- Checks out the latest code
- Creates Docker startup script
- Pushes to HuggingFace Space repository
- HuggingFace rebuilds Docker image
- Vector store is created during image build
- Application is automatically redeployed
Result:
- New transcripts are searchable within minutes
- Zero-downtime deployment
- Automatic rollback on failure

Monitoring Updates

Check Processing Status:

View workflow runs in the GitHub Actions tab
Each run generates a processing report showing:
- Number of videos discovered
- Transcripts generated
- Processed chunks created
- Deployment status

Verify Deployment:

Check HuggingFace Space build logs
Test the /api/health endpoint
Run a sample query to verify new content is searchable

Adding New Video Sources

To add new YouTube channels or playlists to the discovery process:

Update the scraper configuration in the pipeline scripts
The next automated run will discover videos from the new sources
Or manually trigger the workflow to process immediately

Adding Research Papers (PDFs)

To add academic papers or PDF documents to the knowledge base:

Prepare PDFs: Place PDF files in a local directory (e.g., ~/research-papers/)

Process PDFs locally:

# Process PDFs with semantic chunking
python process_pdfs.py ~/research-papers/

# Analyze first without processing (preview)
python process_pdfs.py ~/research-papers/ --analyze-only

Commit processed chunks:

git add processed_pdfs/
git commit -m "Add research papers: [description]"
git push

Automatic deployment: The push triggers HuggingFace rebuild with new papers

PDF Processing Features:

Semantic chunking: Preserves paragraph boundaries and section context
Section detection: Identifies headers and includes section names in metadata
Page tracking: Each chunk includes page number and range
Author extraction: Extracts author metadata when available
Lightweight storage: Raw PDFs stay local, only JSON chunks are committed (~95% smaller)

Note: Raw PDF files are not committed to the repository (see .gitignore). Only the processed JSON chunks in processed_pdfs/ are stored in Git.

Troubleshooting

If automated updates fail:

Check GitHub Actions logs - View detailed error messages in the workflow run
Verify secrets - Ensure YOUTUBE_API_KEY and HF_TOKEN are valid
Check API quotas - YouTube API has daily limits
Manual rebuild - Trigger the workflow manually if the scheduled run missed
Local testing - Run the pipeline locally to debug issues

Common Issues:

YouTube API quota exceeded - Wait for quota reset (midnight Pacific Time)
Transcripts not available - Some videos may not have captions enabled
Long processing times - Large batches may take 1-2 hours

Docker Integration

The project includes comprehensive Docker support:

Dockerfile: Production-ready Docker image
docker-compose.yml: Multi-service orchestration
docker-compose.dev.yml: Development configuration

Environment Variables

PORT=7860                    # Application port
PYTHONPATH=/app              # Python module path
TEST_MODE=false              # Enable test mode (no RAG initialization)

Discord Bot

The project includes a Discord bot integration in the discord/ directory. The bot provides the same semantic search capabilities directly in Discord channels.

See discord/README.md for setup instructions.

Additional Documentation

docs/BEGINNER_GUIDE.md - Getting started guide
docs/DEPLOYMENT_STEPS.md - Deployment instructions
docs/HUGGINGFACE_SETUP.md - HuggingFace Spaces setup
discord/README.md - Discord bot setup

Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests to ensure everything works
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please ensure your code follows the existing style and includes appropriate documentation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Thanks to all contributors who have helped build this project
Built with open-source technologies and libraries
Educational content from various options trading educators

Contact & Support

Issues: Please use GitHub Issues for bug reports and feature requests
Discussions: Join the conversation in GitHub Discussions

Note: This is an educational tool. Always do your own research and consult with financial professionals before making trading decisions.

Name		Name	Last commit message	Last commit date
Latest commit History 254 Commits
.github/workflows		.github/workflows
app		app
archive		archive
discord		discord
docs		docs
frontend/build		frontend/build
processed_pdfs		processed_pdfs
processed_transcripts		processed_transcripts
static		static
templates		templates
transcripts		transcripts
vector_store_backup		vector_store_backup
.cursorignore		.cursorignore
.git-lfs-include		.git-lfs-include
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
claude_test_output.log		claude_test_output.log
cleanup_repo.sh		cleanup_repo.sh
config.py		config.py
create_vector_store.py		create_vector_store.py
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
favicon.ico		favicon.ico
fix_vector_store.log		fix_vector_store.log
full_test.log		full_test.log
main.py		main.py
manual_processing_needed.json		manual_processing_needed.json
minimal_requirements.txt		minimal_requirements.txt
missing_transcripts.json		missing_transcripts.json
outlier_trading_videos.json		outlier_trading_videos.json
outlier_trading_videos_metadata.json		outlier_trading_videos_metadata.json
parallel_transcription_progress.json		parallel_transcription_progress.json
pipeline_config.py		pipeline_config.py
poetry.lock		poetry.lock
preprocess_transcripts.py		preprocess_transcripts.py
process_debug.log		process_debug.log
process_pdfs.py		process_pdfs.py
processing_report.md		processing_report.md
pyproject.toml		pyproject.toml
rag_pipeline.py		rag_pipeline.py
rebuild_vector_store.py		rebuild_vector_store.py
requirements-ci.txt		requirements-ci.txt
requirements.txt		requirements.txt
retry_and_whisper.py		retry_and_whisper.py
run_fastapi_dev.py		run_fastapi_dev.py
run_pipeline.py		run_pipeline.py
runtime.txt		runtime.txt
runtime_requirements.txt		runtime_requirements.txt
setup		setup
startup.sh		startup.sh
test_fixed.log		test_fixed.log
test_output.log		test_output.log
transcript_progress.json		transcript_progress.json
urls_fix.log		urls_fix.log
validate_pipeline.py		validate_pipeline.py
validate_system.py		validate_system.py
vector_diagnostics.log		vector_diagnostics.log
vector_search.py		vector_search.py
vector_store_output.log		vector_store_output.log
video_processing_status.json		video_processing_status.json
video_state.json		video_state.json

bthaile/opteee

Folders and files

Latest commit

History

Repository files navigation

OPTEEE - Options Trading Education Expert

Overview

Features

Knowledge Base

Architecture

Quick Start

Prerequisites

Local Development Setup

Docker Deployment

API Documentation

Endpoints

Project Structure

Key Technologies

Development Workflow

Configuration

Updating Knowledgebase

Automated Weekly Updates

Manual Workflow Triggering

Local Knowledge Base Rebuild

Workflow Configuration

Deployment Pipeline

Monitoring Updates

Adding New Video Sources

Adding Research Papers (PDFs)

Troubleshooting

Docker Integration

Environment Variables

Discord Bot

Additional Documentation

Contributing

License

Acknowledgments

Contact & Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages