| title | emoji | colorFrom | colorTo | sdk | app_port | pinned | env | |
|---|---|---|---|---|---|---|---|---|
opteee |
🔥 |
blue |
red |
docker |
7860 |
false |
|
A powerful semantic search application providing intelligent Q&A across a curated collection of options trading educational content. Built with modern technologies for fast, accurate, and context-aware responses.
OPTEEE uses advanced natural language processing and vector similarity search to help traders learn from a comprehensive knowledge base of options trading transcripts and educational videos. Ask questions in plain English and get detailed answers with direct links to relevant source material.
- Semantic Search: Advanced NLP-powered search that understands meaning, not just keywords
- Fast Retrieval: FAISS vector database delivers millisecond search responses
- Multi-Source Knowledge Base: Combines video transcripts and academic research papers
- Video Integration: Direct links to specific timestamps in source YouTube videos
- Research Paper Support: Academic papers with page references and section context
- Chat Interface: Modern, responsive chat UI with conversation history
- Source Citations: Every answer includes clickable references with timestamps or page numbers
- Context-Aware: Maintains conversation history for follow-up questions
- Responsive Design: Works seamlessly on desktop and mobile devices
OPTEEE draws from two primary sources:
| Source Type | Content | Count |
|---|---|---|
| Video Transcripts | Options trading tutorials, strategy explanations, market analysis | 17,200+ chunks |
| Research Papers | Academic papers on PEAD, volatility, retail trading behavior | 8,900+ chunks |
Total: 26,100+ searchable knowledge chunks
- Backend: FastAPI with RESTful API endpoints
- Frontend: React with modern UI components
- Search Engine: Sentence-transformers with FAISS vector database
- NLP Model: all-MiniLM-L6-v2 for semantic embeddings
- Deployment: Docker containerization for easy deployment
- Python 3.9 or higher
- Docker (optional, for containerized deployment)
- Git
- Clone the repository:
git clone https://github.com/yourusername/opteee.git
cd opteee- Install dependencies:
pip install -r requirements.txt- Run the development server:
python main.pyThe application will be available at http://localhost:7860
Build and run with Docker:
# Build the Docker image
docker build -t opteee .
# Run the container
docker run -p 7860:7860 opteeeOr use Docker Compose:
docker-compose up-
GET
/api/health- Health check endpoint- Returns service status and version information
-
POST
/api/chat- Main chat endpoint- Request body:
{ "query": "What is a covered call?", "provider": "huggingface", "num_results": 5, "format": "detailed", "conversation_history": [] } - Returns answer with sources and timestamps
- Request body:
-
GET
/- Serves the React frontend application
opteee/
├── main.py # FastAPI application entry point
├── config.py # Configuration and settings
├── rag_pipeline.py # RAG implementation
├── vector_search.py # Vector similarity search
├── create_vector_store.py # Vector store creation (transcripts + PDFs)
├── rebuild_vector_store.py # Vector store rebuilding
├── process_pdfs.py # PDF semantic chunking utility
├── app/
│ ├── models/ # Pydantic models
│ │ └── chat_models.py # Chat request/response models (supports video + PDF)
│ └── services/ # Business logic services
│ ├── rag_service.py # RAG service implementation
│ └── formatters.py # Response formatting (HTML + Discord)
├── frontend/
│ └── build/ # React production build
├── vector_store/ # FAISS vector database files
├── processed_transcripts/ # Processed video transcript chunks (JSON)
├── processed_pdfs/ # Processed PDF document chunks (JSON)
├── transcripts/ # Raw transcript data
├── static/ # Static assets (CSS, JS)
├── templates/ # HTML templates
├── discord/ # Discord bot integration
│ ├── discord_bot.py # Discord bot implementation
│ └── ... # Bot configuration files
├── docs/ # Documentation
├── archive/ # Archived utilities and scripts
├── Dockerfile # Docker configuration
├── docker-compose.yml # Docker Compose configuration
└── requirements.txt # Python dependencies
- FastAPI - High-performance Python web framework
- React - Modern frontend JavaScript library
- Sentence Transformers - State-of-the-art sentence embeddings
- FAISS - Efficient similarity search and clustering
- Docker - Containerization platform
- HuggingFace - Model hosting and deployment
- Backend Changes: Modify FastAPI endpoints in
main.pyor services inapp/services/ - Frontend Changes: Update React components in
frontend/src/(requires separate build) - Testing: Run locally with
python main.py - Vector Store Updates: Rebuild with
python rebuild_vector_store.py - Deploy: Build and push Docker image
Key configuration options in config.py:
MODEL_NAME: Sentence transformer model (default: "all-MiniLM-L6-v2")TOP_K: Number of top results to retrieve (default: 5)CHUNK_SIZE: Size of text chunks for processing (default: 500)CHUNK_OVERLAP: Overlap between chunks (default: 50)
OPTEEE uses an automated GitHub Actions workflow to keep the knowledge base up-to-date with the latest educational content. The system automatically discovers new videos, generates transcripts, and deploys updates.
The knowledge base is automatically updated every Sunday at 8:00 PM UTC (3:00 PM CT) through the Process Video Transcripts Weekly workflow:
What happens automatically:
- Video Discovery - Scans YouTube channels for new educational content
- Transcript Generation - Creates text transcripts from videos using YouTube API and Whisper
- Text Processing - Chunks transcripts into searchable segments (250 words with 50-word overlap)
- Repository Update - Commits new transcripts and processed data to the repository
- Deployment Trigger - Automatically triggers HuggingFace Space deployment
- Vector Store Rebuild - HuggingFace rebuilds the FAISS vector database during Docker build
Processing Pipeline:
GitHub Actions: HuggingFace Spaces:
┌─────────────────────┐ ┌──────────────────────┐
│ 1. Video Discovery │ │ 5. Docker Build │
│ 2. Transcripts │ ───(push)───> │ 6. Vector Store │
│ 3. Text Processing │ │ 7. Deploy App │
│ 4. Commit & Push │ └──────────────────────┘
└─────────────────────┘
You can manually trigger the knowledge base update at any time:
Via GitHub Web Interface:
- Navigate to the Actions tab in the GitHub repository
- Select "Process Video Transcripts Weekly" workflow
- Click "Run workflow" button
- Choose the branch (usually
main) - Click "Run workflow" to start
Via GitHub CLI:
gh workflow run "Process Video Transcripts Weekly"To rebuild the vector store locally (for development or testing):
# Rebuild the entire vector store from processed transcripts
python rebuild_vector_store.py
# Or use the create script directly
python create_vector_store.pyNote: The vector store files (vector_store/) are large and should not be committed to the repository. They are rebuilt automatically during deployment.
The automated pipeline is configured in .github/workflows/process-transcripts.yml:
Key Settings:
- Schedule: Weekly on Sunday at 20:00 UTC
- Timeout: 180 minutes (3 hours) for large processing jobs
- Python Version: 3.10
- Dependencies: FFmpeg (for audio processing), PyTorch, Sentence-Transformers
Required Secrets:
YOUTUBE_API_KEY- For accessing YouTube API to fetch video metadata and transcriptsHF_TOKEN- For deploying to HuggingFace Spaces
After transcripts are processed, the Deploy to Hugging Face Space workflow automatically:
-
Triggers On:
- Push to
mainbranch - Manual trigger via workflow dispatch
- Automatic trigger after transcript updates
- Push to
-
Deployment Steps:
- Checks out the latest code
- Creates Docker startup script
- Pushes to HuggingFace Space repository
- HuggingFace rebuilds Docker image
- Vector store is created during image build
- Application is automatically redeployed
-
Result:
- New transcripts are searchable within minutes
- Zero-downtime deployment
- Automatic rollback on failure
Check Processing Status:
- View workflow runs in the GitHub Actions tab
- Each run generates a processing report showing:
- Number of videos discovered
- Transcripts generated
- Processed chunks created
- Deployment status
Verify Deployment:
- Check HuggingFace Space build logs
- Test the
/api/healthendpoint - Run a sample query to verify new content is searchable
To add new YouTube channels or playlists to the discovery process:
- Update the scraper configuration in the pipeline scripts
- The next automated run will discover videos from the new sources
- Or manually trigger the workflow to process immediately
To add academic papers or PDF documents to the knowledge base:
-
Prepare PDFs: Place PDF files in a local directory (e.g.,
~/research-papers/) -
Process PDFs locally:
# Process PDFs with semantic chunking python process_pdfs.py ~/research-papers/ # Analyze first without processing (preview) python process_pdfs.py ~/research-papers/ --analyze-only
-
Commit processed chunks:
git add processed_pdfs/ git commit -m "Add research papers: [description]" git push -
Automatic deployment: The push triggers HuggingFace rebuild with new papers
PDF Processing Features:
- Semantic chunking: Preserves paragraph boundaries and section context
- Section detection: Identifies headers and includes section names in metadata
- Page tracking: Each chunk includes page number and range
- Author extraction: Extracts author metadata when available
- Lightweight storage: Raw PDFs stay local, only JSON chunks are committed (~95% smaller)
Note: Raw PDF files are not committed to the repository (see .gitignore). Only the processed JSON chunks in processed_pdfs/ are stored in Git.
If automated updates fail:
- Check GitHub Actions logs - View detailed error messages in the workflow run
- Verify secrets - Ensure
YOUTUBE_API_KEYandHF_TOKENare valid - Check API quotas - YouTube API has daily limits
- Manual rebuild - Trigger the workflow manually if the scheduled run missed
- Local testing - Run the pipeline locally to debug issues
Common Issues:
- YouTube API quota exceeded - Wait for quota reset (midnight Pacific Time)
- Transcripts not available - Some videos may not have captions enabled
- Long processing times - Large batches may take 1-2 hours
The project includes comprehensive Docker support:
- Dockerfile: Production-ready Docker image
- docker-compose.yml: Multi-service orchestration
- docker-compose.dev.yml: Development configuration
PORT=7860 # Application port
PYTHONPATH=/app # Python module path
TEST_MODE=false # Enable test mode (no RAG initialization)The project includes a Discord bot integration in the discord/ directory. The bot provides the same semantic search capabilities directly in Discord channels.
See discord/README.md for setup instructions.
docs/BEGINNER_GUIDE.md- Getting started guidedocs/DEPLOYMENT_STEPS.md- Deployment instructionsdocs/HUGGINGFACE_SETUP.md- HuggingFace Spaces setupdiscord/README.md- Discord bot setup
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests to ensure everything works
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please ensure your code follows the existing style and includes appropriate documentation.
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to all contributors who have helped build this project
- Built with open-source technologies and libraries
- Educational content from various options trading educators
- Issues: Please use GitHub Issues for bug reports and feature requests
- Discussions: Join the conversation in GitHub Discussions
Note: This is an educational tool. Always do your own research and consult with financial professionals before making trading decisions.