This project implements a Retrieval-Augmented Generation (RAG) system that allows you to process documents, store them in a vector database, and query them using natural language. The system uses Qdrant for vector storage, Hugging Face's RoSBERTa for embeddings, and a local GGUF model (T-lite-it-1.0) for generating responses. It also features LangGraph workflows for structured RAG pipelines and LangSmith tracing for monitoring and debugging.
- Document processing for various file formats (PDF, DOCX, HTML, plain text)
- Vector embeddings using Hugging Face's RoSBERTa model
- Vector storage and retrieval with Qdrant
- Interactive chat interface with Gradio
- Local LLM inference with llama-cpp-python
- Hybrid search combining semantic and keyword-based retrieval
- LangGraph integration for structured RAG workflows
- LangSmith tracing for monitoring and debugging performance
- Support for multiple document formats in vector storage (flexible field mapping)
- Python 3.8+
- Qdrant vector database (running on localhost:6333)
- GGUF Model (T-lite-it-1.0-Q4_K_M-GGUF included)
- Poetry (recommended) or pip
- LangChain and LangGraph (included in requirements.txt)
- LangSmith API key (optional, for tracing and monitoring)
-
Clone the repository:
git clone <your-repository-url> cd rag
-
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
# For Ubuntu/Debian: sudo apt-get update sudo apt-get install -y build-essential cmake python3-dev python3-venv # For CentOS/RHEL: sudo yum groupinstall -y "Development Tools" sudo yum install -y cmake python3-devel # Install Python dependencies pip install -r requirements.txt
-
Download and run Qdrant (using Docker):
docker pull qdrant/qdrant docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
-
Download the GGUF model (if not included):
- The project includes a pre-configured model in
model/T-lite-it-1.0-Q4_K_M-GGUF/ - To use a different model, update the path in
gradio_app.py
- The project includes a pre-configured model in
data/- Directory for storing documents to be processedmodel/- Contains the GGUF model filesdata_processing.py- Script for processing and indexing documentsrag_app.py- Main RAG application with LangGraph workflows and Gradio interfaceprompts.py- System prompts and templatesrequirements.txt- Python dependencies
Place the documents you want to process in the data/ directory. Supported formats include:
- Text files (.txt)
- PDF documents (.pdf)
- Word documents (.docx)
- HTML files (.html, .htm)
Run the document processing script to create vector embeddings and store them in Qdrant:
python data_processing.pyThis will:
- Load documents from the
data/directory - Split them into chunks
- Generate embeddings using RoSBERTa
- Store them in the Qdrant database
Start the Gradio chat interface:
python rag_app.pyThis will start a local web server (usually at http://localhost:7860) with the chat interface where you can ask questions about your documents.
You can customize the following aspects of the system:
- Model Parameters: Adjust
n_ctx,n_threads, and other parameters in theLangChainAssistantclass - Search Settings: Modify the hybrid search parameters in
HybridSearchclass - UI Settings: Customize the Gradio interface in
create_demo() - LangGraph Workflow: Modify the RAG workflow in
RAGAssistant._create_rag_graph() - LangSmith Tracing: Configure with environment variables
LANGCHAIN_API_KEYandLANGCHAIN_PROJECT - Chunk size and overlap: Adjust in
data_processing.py - Qdrant connection: Modify the connection parameters in
Configclass
-
Qdrant connection issues:
- Make sure Qdrant is running (
docker psshould show the container) - Check if ports 6333 and 6334 are available
- Make sure Qdrant is running (
-
OCR issues:
- Verify Tesseract is installed and in your PATH
- For non-English text, you might need to install additional language packs
-
Vector dimension mismatch:
- If you see "Vector dimension error" from Qdrant, check the embedding dimensions in
SimpleEmbeddings - The system now automatically pads vectors to the expected 1024 dimensions
- If you see "Vector dimension error" from Qdrant, check the embedding dimensions in
-
LangGraph and LangSmith integration:
- Do not use
@tracedecorators on functions passed to LangGraph'sadd_node - Instead, use
with trace(name="function_name"):inside functions - This prevents the "trace object not callable" error
- Do not use
For large document collections, you might need to increase the available memory for Python and Ollama.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or issues, please open an issue in the repository.