A comprehensive Retrieval-Augmented Generation (RAG) system with multiple AI capabilities including vector search, hybrid search, step-back prompting, knowledge graph construction, and contract extraction.
- Vector Similarity Search - Semantic search using sentence transformers
- Full-Text Keyword Search - Traditional keyword-based search
- Hybrid Search - Combines vector and keyword search for optimal results
- Step-Back Prompting - Generates broader questions for better retrieval
- Parent-Child Chunking - Hierarchical document structure for better context
- Knowledge Graph Construction - Extract structured information from legal documents
- Contract Analysis - Parse and analyze contract terms, parties, and relationships
- Text2Cypher - Convert natural language to Neo4j Cypher queries
- Entity Extraction - Extract entities and relationships from text
- Graph RAG - Global and local graph-based retrieval
- Agentic RAG - Multi-tool agent-based retrieval system
- Interactive web interface for all features
- Real-time testing and visualization
- Knowledge graph visualization
- Contract extraction playground
- FastAPI - Modern Python web framework
- Neo4j - Graph database for knowledge storage
- Sentence Transformers - Embedding generation
- Google Gemini - Large Language Model
- PDF Processing - Document parsing and chunking
- React - Modern JavaScript framework
- Tailwind CSS - Utility-first CSS framework
- React Router - Client-side routing
Before setting up the project, ensure you have the following installed:
- Python 3.9+
- Node.js 16+ and npm
- Neo4j Database (local or cloud instance)
- Git
git clone https://github.com/codergoel/advanced-rag-system-v2.git
cd advanced-rag-system-v2python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the backend directory:
# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password
# Google Gemini API
GOOGLE_API_KEY=your_gemini_api_key
# Optional: OpenAI API (if using OpenAI embeddings)
OPENAI_API_KEY=your_openai_api_keyMake sure Neo4j is running on your system:
- Local: Start Neo4j Desktop or Docker container
- Cloud: Use Neo4j AuraDB or other cloud providers
cd frontend
npm installcd backend
source venv/bin/activate
python main.pyThe backend will be available at http://localhost:8000
cd frontend
npm startThe frontend will be available at http://localhost:3000
Navigate to /knowledge-graph-construction to:
- Extract structured information from legal documents
- Import contracts into the knowledge graph
- Query the graph with natural language
- Visualize relationships and statistics
Use /rag-chat to:
- Upload PDF documents
- Ask questions about your documents
- Use different search strategies (vector, keyword, hybrid)
- Test step-back prompting
Visit /text2cypher to:
- Convert natural language to Cypher queries
- Load sample datasets
- Test query generation
Use /entity-extraction to:
- Extract entities from text
- Build knowledge graphs
- Visualize entity relationships
POST /api/rag/query- Perform RAG queryPOST /api/rag/stepback- Step-back RAG pipelinePOST /api/rag/test- Test all RAG functionalityGET /api/rag/documents/count- Get document statistics
POST /api/knowledge-graph/extract- Extract contract informationPOST /api/knowledge-graph/import- Import to knowledge graphGET /api/knowledge-graph/data- Get graph dataPOST /api/knowledge-graph/query- Query the graph
POST /api/text2cypher/query- Generate Cypher from natural languageGET /api/text2cypher/schema- Get database schemaPOST /api/text2cypher/load-movies- Load sample dataset
curl -X POST http://localhost:8000/api/rag/test \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic of the documents?"}'curl -X POST http://localhost:8000/api/knowledge-graph/extract \
-H "Content-Type: application/json" \
-d '{"document": "Your contract text here..."}'advanced-rag-system/
βββ backend/
β βββ services/
β β βββ rag_service.py # Core RAG functionality
β β βββ knowledge_graph_construction_service.py
β β βββ text2cypher_service.py
β β βββ entity_extraction_service.py
β β βββ neo4j_service.py # Database operations
β β βββ embedding_service.py # Vector embeddings
β β βββ gemini_service.py # LLM integration
β β βββ pdf_service.py # Document processing
β βββ main.py # FastAPI application
β βββ config.py # Configuration
β βββ requirements.txt # Python dependencies
βββ frontend/
β βββ src/
β β βββ pages/ # React components
β β βββ services/ # API services
β β βββ components/ # Reusable components
β βββ package.json # Node dependencies
β βββ tailwind.config.js # Styling configuration
βββ README.md # This file
βββ requirements.txt # Python dependencies
- Uses sentence-transformers for embedding generation
- Cosine similarity for document ranking
- Configurable result limits
- Combines vector and keyword search results
- Score normalization for fair comparison
- Deduplication of results
- Generates broader questions for better retrieval
- Uses LLM to create step-back questions
- Improves answer quality through better context
- Hierarchical document structure
- Child chunks for detailed retrieval
- Parent chunks for broader context
-
Neo4j Connection Error
- Ensure Neo4j is running
- Check connection credentials in
.env - Verify network connectivity
-
Embedding Generation Fails
- Check internet connection (for model download)
- Verify sentence-transformers installation
- Check available memory
-
Frontend Build Errors
- Clear node_modules and reinstall:
rm -rf node_modules && npm install - Check Node.js version compatibility
- Verify all dependencies are installed
- Clear node_modules and reinstall:
-
API Endpoints Not Found
- Ensure backend server is running
- Check for any startup errors
- Verify port 8000 is available
-
Neo4j Indexes
- Vector indexes are created automatically
- Full-text indexes for keyword search
- Monitor query performance
-
Embedding Caching
- Consider caching embeddings for repeated queries
- Use batch processing for large documents
-
Memory Management
- Monitor memory usage during large document processing
- Consider chunking large documents
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Neo4j for graph database capabilities
- Google Gemini for LLM integration
- Sentence Transformers for embedding generation
- FastAPI for the robust backend framework
- React for the modern frontend interface
For support and questions:
- Create an issue in the GitHub repository
- Check the troubleshooting section above
- Review the API documentation at
http://localhost:8000/docs
Happy RAG-ing! π