A comprehensive Multimodal Retrieval-Augmented Generation (RAG) application that combines FastAPI backend with Streamlit frontend, supporting multiple AI models, advanced OCR capabilities, and intelligent document processing.
- Azure OpenAI (GPT-4, GPT-3.5-turbo)
- Google Gemini (gemini-1.5-flash, gemini-pro)
- Claude (claude-3-sonnet via AWS Bedrock)
- Qwen & Nvidia (Local models via Ollama)
- Tesseract OCR (Free, local processing)
- Florence-2 (Microsoft's vision model)
- Google Vision API (Cloud-based accuracy)
- OpenAI GPT-4 Vision (Intelligent understanding)
- Claude Vision (Advanced document analysis)
- BM25 (Traditional keyword search)
- Qdrant Embeddings (Semantic vector search)
- Reciprocal Rank Fusion (Hybrid approach)
- Streamlit Web App with multiple pages
- Document Chat interface
- Image Analysis powered by Florence-2
- Knowledge Object generation
- Real-time OCR processing
βββββββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ
β Streamlit UI β ββ β FastAPI Backend β ββ β AI Models & OCR β
β (Port 8501) β β (Ports 8000/8001) β β (External APIs) β
βββββββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ
- Port 8000: Main API for Knowledge Object generation
- Port 8001: Chat API for document interaction and OCR
- Port 8501: Streamlit web interface
- Python 3.11+
- Windows OS (current configuration)
- Git
git clone https://github.com/selvatharrun/Multimodal-RAG-Application.git
cd Multimodal-RAG-Application
# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txt
# Using Windows Package Manager (Recommended)
winget install --id UB-Mannheim.TesseractOCR
# Or download from: https://github.com/UB-Mannheim/tesseract/releases
# Copy template and add your API keys
cp florence2/config.properties.template florence2/config.properties
# Edit config.properties with your actual API keys
# Quick start (all servers)
.\run_streamlit.bat
# Or manually:
# Terminal 1: python florence2/main.py
# Terminal 2: python florence2/chatapi.py
# Terminal 3: streamlit run florence2/mainpage.py
- Web UI: http://localhost:8501
- API Docs: http://localhost:8000/docs & http://localhost:8001/docs
Document | Description |
---|---|
COMPLETE_DOCUMENTATION.md | π Complete setup, API docs, troubleshooting |
TESSERACT_SETUP.md | π§ Tesseract OCR installation guide |
config.properties.template | βοΈ Configuration template for API keys |
- Visit Azure Portal
- Create/access Azure OpenAI resource
- Copy Key, Endpoint, and Deployment Name
- Go to Google AI Studio
- Create new API key
- Access AWS Console
- Create IAM access keys
- Enable Bedrock service
- POST
/upload-file/
- Generate Knowledge Objects from documents
- POST
/extract_text/
- Extract text using various OCR methods - POST
/search_and_respond/
- Chat with documents using RAG
- Document Analysis: Extract insights from PDFs, DOCX, PPTX
- Knowledge Management: Generate structured knowledge articles
- Visual Understanding: Analyze images and charts with AI
- Interactive Chat: Q&A with document content
- Multi-format Processing: Handle text, images, and mixed content
- Backend: FastAPI, Python 3.11+
- Frontend: Streamlit
- AI/ML: LangChain, Transformers, PyTorch
- OCR: Tesseract, Florence-2, Cloud APIs
- Search: Qdrant, BM25S, Embeddings
- Models: Azure OpenAI, Google Gemini, Claude, Qwen
- Connection Errors: Ensure all servers are running on correct ports
- API Key Errors: Verify keys in
config.properties
- Import Errors: Check virtual environment and dependencies
- Tesseract Errors: Verify installation and path configuration
See COMPLETE_DOCUMENTATION.md for detailed troubleshooting.
project-root/
βββ florence2/ # Main application
β βββ main.py # FastAPI server (8000)
β βββ chatapi.py # FastAPI server (8001)
β βββ mainpage.py # Streamlit main page
β βββ config.properties # API configuration
β βββ pages/ # Streamlit pages
β βββ API/ # Backend modules
βββ venv/ # Virtual environment
βββ requirements.txt # Dependencies
βββ COMPLETE_DOCUMENTATION.md # Full documentation
βββ README.md # This file
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Microsoft Florence-2 for vision capabilities
- OpenAI for language models
- Google for Gemini models
- Anthropic for Claude
- Tesseract for OCR functionality
- LangChain for RAG framework
- π Check COMPLETE_DOCUMENTATION.md for detailed guides
- π Report issues on GitHub
- π¬ Join discussions in the repository
π Star this repository if you find it useful! π