This PathRAG Demo App is a comprehensive knowledge graph-based Retrieval-Augmented Generation (RAG) system that combines document processing, chat functionality, and knowledge graph visualization in a single application.
PathRAG (Path-based Retrieval Augmented Generation) is an advanced approach to knowledge retrieval and generation that combines the power of knowledge graphs with large language models (LLMs).
PathRAG builds and maintains a knowledge graph from your documents, where:
- Nodes represent entities (people, organizations, concepts, locations, etc.)
- Edges represent relationships between these entities
- Properties store additional information about entities and relationships
Unlike traditional RAG systems that rely solely on vector similarity:
- PathRAG identifies relevant paths through the knowledge graph
- These paths provide contextual connections between entities
- The system can follow logical relationships to find information not directly mentioned
PathRAG combines multiple search strategies:
- Vector search for semantic similarity
- Graph traversal for relationship-based connections
- Entity-centric retrieval for focused information about specific entities
- Relational understanding: Captures relationships between concepts, not just similarity
- Explainability: Provides clear paths showing how information is connected
- Reduced hallucinations: Grounds responses in explicit knowledge connections
- Complex reasoning: Can answer multi-hop questions requiring several logical steps
-
Document Processing:
- Documents are chunked into manageable pieces
- Entities and relationships are extracted using NLP techniques
- A knowledge graph is constructed connecting these entities
-
Query Processing:
- User queries are analyzed to identify key entities and intents
- The system identifies relevant paths in the knowledge graph
- Both vector similarity and graph structure are used to retrieve information
-
Response Generation:
- Retrieved context from multiple paths is synthesized
- The LLM generates responses grounded in this structured knowledge
- Responses include information from across the knowledge graph
- Document Management: Upload, process, and manage documents (PDF, DOCX, MD, TXT, HTML, etc.)
- Chat Interface: Thread-based chat system with context-aware responses
- Knowledge Graph: Visualize and query the knowledge graph built from your documents
- User Management: User authentication and personalization
- React Frontend: Modern UI built with React and RSuite components
- FastAPI Backend: High-performance Python API with async support
- SQLite Database: Lightweight database for storing user data, chat threads, and document metadata
- Thread-Based Chat: Persistent chat threads with unique IDs
- Document Processing: Automatic extraction of text and entities from various document formats
- Knowledge Graph Visualization: Interactive visualization using D3.js
- Theme Customization: Customizable UI themes (blue, red, violet)
- Automatic Document Reloading: System checks document status every 15 seconds and automatically reloads when processing completes
- FastAPI: Modern, fast web framework for building APIs
- SQLite: Lightweight database for storing users, chats, and documents
- JWT: JSON Web Tokens for authentication
- PathRAG: Path-based Retrieval Augmented Generation for knowledge graph and chat functionality
- NetworkX: Graph data structure and algorithms (for development/demo)
- NanoVectorDB: Local file-based vector storage (for development/demo)
- React: JavaScript library for building user interfaces
- RSuite: UI component library with responsive design
- D3.js: Data visualization library for knowledge graph
- React Router: Navigation and routing
- Axios: HTTP client for API requests
- React Dropzone: Drag-and-drop file upload
- Font Awesome: Icon library
/api
/auth - Authentication module
- jwt_handler.py - JWT token handling
- routes.py - Authentication endpoints
- schemas.py - Authentication data models
/features - Feature modules
/users - User management
/chats - Chat functionality
/documents - Document management
/knowledge_graph - Knowledge graph functionality
/models - Database models
- database.py - SQLite database setup and models
main.py - Main application entry point
/pathrag-ui
/public - Static files
/src
/components - Reusable components
/auth - Authentication components
/chat - Chat components
/documents - Document components
/knowledge-graph - Knowledge graph components
/context - React context providers
/pages - Application pages
/services - API services
/utils - Utility functions
App.js - Main application component
index.js - Application entry point
- Python 3.8+
- Node.js 14+
- npm or yarn
Use our start script to set up and run the API:
For Unix/Linux/macOS:
# Make the script executable (first time only)
chmod +x start-api.sh
# Run the API
./start-api.shFor Windows:
# Run the API
start-api.batThese scripts will:
- Create a Python virtual environment named
.venvif it doesn't exist - Install all backend dependencies
- Start the backend API on port 8000
The API will be available at:
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
Navigate to the UI directory and start the React application:
# Navigate to the UI directory
cd ui
# Install dependencies (first time only)
npm install
# Start the UI
npm startThe UI will be available at:
- Frontend UI: http://localhost:3000
If you prefer to set up and run the components separately, follow these instructions:
-
Create and activate a virtual environment:
# Create virtual environment python -m venv .venv # Activate on Windows .venv\Scripts\activate # Activate on macOS/Linux source .venv/bin/activate
-
Install Python dependencies:
pip install -r requirements.txt
-
Configure environment variables: Copy the sample environment file and modify it with your settings:
cp sample.env .env # Edit .env with your preferred text editorKey environment variables include:
# JWT Authentication SECRET_KEY=your_secret_key_here # Generate with: openssl rand -hex 32 ACCESS_TOKEN_EXPIRE_MINUTES=30 # Application Directories WORKING_DIR=./data UPLOAD_DIR=./uploads # Database Configuration DATABASE_URL=sqlite:///./pathrag.db # Server Configuration HOST=0.0.0.0 PORT=8000 DEBUG=False LOG_LEVEL=info CORS_ORIGINS=http://localhost:3000 # AI Model Settings (choose one option) # Option 1: Azure OpenAI AZURE_OPENAI_API_KEY=your_azure_key AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_DEPLOYMENT=gpt-4o AZURE_OPENAI_API_VERSION=2023-05-15 AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large # Option 2: OpenAI direct OPENAI_API_KEY=your_openai_key OPENAI_API_BASE=https://api.openai.com/v1 # PathRAG Configuration CHUNK_SIZE=1200 CHUNK_OVERLAP=100 MAX_TOKENS=32768 TEMPERATURE=0.7 TOP_K=40See INSTALLATION.md for detailed environment variable configuration and sample.env for a complete example.
-
Start the backend server:
python main.py
The API will be available at http://localhost:8000
For more advanced options, see INSTALLATION.md.
-
API Documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
-
Navigate to the frontend directory:
cd ui -
Install dependencies:
npm install
-
Start the development server:
npm start
The application will be available at http://localhost:3000
- Use the default credentials to log in:
- Username: user1, Password: Pass@123
- Username: user2, Password: Pass@123
- Username: user3, Password: Pass@123
- Or register a new account using the registration form
- Navigate to the Chat page
- Click "New Chat" to start a new thread
- Type your message in the input field
- Press Enter or click the send button
- View the AI response
- Your chat threads are saved and can be accessed from the sidebar
- Each thread has a unique ID and maintains its own conversation history
- Thread titles are automatically updated based on the first message
- Navigate to the Knowledge Graph page
- Enter a query in the search field to filter the graph
- Interact with the graph by dragging nodes
- Zoom in/out using the mouse wheel
- Click on nodes to see entity details
- Navigate to the Documents page
- Click "Upload Document" button
- Drag and drop a file or click to select a file
- Monitor the upload progress
- The system automatically checks document status every 15 seconds
- When processing completes, the system automatically reloads the PathRAG instance
- You can also manually reload the PathRAG instance by clicking the "Reload Documents" button
- View the uploaded documents in the list with their processing status
id: Integer (Primary Key)username: String (Unique)email: String (Unique)hashed_password: Stringcreated_at: DateTimetheme: String (Default: "blue")
id: Integer (Primary Key)uuid: String (Unique)user_id: Integer (Foreign Key to User)title: Stringcreated_at: DateTimeupdated_at: DateTimeis_deleted: Boolean (Default: False)
id: Integer (Primary Key)user_id: Integer (Foreign Key to User)thread_id: Integer (Foreign Key to Thread)role: String ('user' or 'system')message: Textcreated_at: DateTime
id: Integer (Primary Key)user_id: Integer (Foreign Key to User)filename: Stringcontent_type: Stringfile_path: Stringfile_size: Integeruploaded_at: DateTimestatus: Stringprocessed_at: DateTime (Nullable)error_message: Text (Nullable)
POST /token: Login and get access tokenPOST /register: Register a new userGET /users/me: Get current user information
GET /users/: Get all usersPOST /users/theme: Update user theme
GET /chats/threads: Get all chat threadsPOST /chats/threads: Create a new chat threadGET /chats/threads/{thread_uuid}: Get a specific thread with all its chatsPUT /chats/threads/{thread_uuid}: Update a thread's titleDELETE /chats/threads/{thread_uuid}: Mark a thread as deleted
GET /chats/: Get all chatsGET /chats/recent: Get the 5 most recent chat threadsPOST /chats/chat/{thread_uuid}: Create a new chat message in a thread
GET /documents/: Get all documentsPOST /documents/upload: Upload a documentGET /documents/{document_id}: Get a specific documentGET /documents/{document_id}/status: Get document processing statusPOST /documents/reload: Reload the PathRAG instance to recognize new documents
GET /knowledge-graph/: Get the knowledge graphPOST /knowledge-graph/query: Query the knowledge graph
- Vector Storage: NanoVectorDB (local file-based vector store)
- Graph Storage: NetworkX (local in-memory graph)
- Key-Value Storage: JsonKVStorage (local file-based storage)
Note: These storage options are suitable for demonstration and development purposes only. They are not recommended for production use with large datasets or high traffic.
For production environments, consider using these alternatives:
- Vector Databases: PostgreSQL (pgvector), Pinecone, DataStax, Azure Cognitive Search, Azure SQL(Preview)
- Graph Databases: Neo4j, ArangoDB, Apache AGE (PostgreSQL extension), CosmosDB, Azure SQL
- Document Databases: MongoDB, Cassandra, CosmosDB
This project is licensed under the MIT License - see the LICENSE file for details.
PathRAG is particularly effective for:
- Research assistance: Connecting findings across multiple papers and sources
- Legal document analysis: Identifying relationships between cases, statutes, and legal concepts
- Medical knowledge systems: Connecting symptoms, conditions, treatments, and research
- Multi-hop question answering: "What treatments were developed based on research by scientists who studied under Marie Curie?"
- Contextual understanding: Understanding how different parts of a document relate to each other
- Exploratory research: Discovering unexpected connections between concepts
- Corporate knowledge bases: Connecting information across departments and documents
- Compliance and regulation: Tracking relationships between policies, regulations, and procedures
- Institutional memory: Preserving and accessing organizational knowledge
- Knowledge graph quality: The system's effectiveness depends on the quality of entity and relationship extraction
- Computational complexity: Graph operations can be more resource-intensive than simple vector searches
- Domain specificity: May require domain-specific entity extraction for specialized fields
- Storage limitations: The default storage options (NanoVectorDB, NetworkX) are not suitable for large-scale production use
-
Boyu Chen¹, Zirui Guo¹,², Zidan Yang¹,³, Yuluo Chen¹, Junze Chen¹, Zhenghao Liu³, Chuan Shi¹, Cheng Yang¹
- Beijing University of Posts and Telecommunications
- University of Hong Kong
- Northeastern University
Contact: chenbys4@bupt.edu.cn, yangcheng@bupt.edu.cn
- Robert Dennyson, Solution Architect, UK
- Contact: robertdennyson@live.in
- PathRAG for the knowledge graph and retrieval augmented generation capabilities
- RSuite for the UI components
- D3.js for the knowledge graph visualization