This application is a full-stack document indexing and retrieval system that allows users to upload documents, index their content, and perform natural language queries against the indexed documents. It utilizes LlamaIndex, OpenAI embeddings, and a modern React frontend to provide an interactive experience for semantic search and document retrieval.
- Document Upload: Upload text files to be indexed and stored
- Document Management: View a list of all uploaded documents
- Semantic Search: Ask questions in natural language and get AI-generated answers
- Source References: View the source documents and passages used to generate answers
- Real-time Responses: Asynchronous processing with live feedback
The application follows a multi-service architecture:
-
Index Server (
index_server.py
)- Core document processing and indexing logic
- Uses LlamaIndex and OpenAI embeddings for semantic understanding
- Maintains a vector store index for document retrieval
- Exposes services via a BaseManager server on port 5602
-
API Server (
flask_demo.py
)- Flask-based REST API
- Handles document upload, query requests, and document listing
- Communicates with the index server using BaseManager client
- Exposes endpoints on port 5601
- React Application (
react_frontend/
)- TypeScript React application
- Responsive UI for document management and querying
- Components for uploading, viewing documents, and querying the index
- Python 3.11
- Flask: Web framework for API endpoints
- LlamaIndex: Document indexing and retrieval library
- OpenAI API: For embeddings and LLM capabilities
- Multiprocessing: For inter-process communication
- React 18: UI framework
- TypeScript: Type-safe JavaScript
- SCSS: Styling
- React Spinners: Loading indicators
- ClassNames: Conditional class application
- Python 3.11+
- Node.js 16+
- OpenAI API key
-
Clone the repository
-
Set up Python environment
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt
-
Set up React frontend
cd react_frontend npm install
-
Set OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
-
Start the application
./launch_app.sh
-
Build the Docker image
docker build -t document-indexer .
-
Run the container
docker run -p 5601:5601 -p 3000:3000 -e OPENAI_API_KEY="your-api-key-here" document-indexer
-
Access the application
- Open a web browser and navigate to
http://localhost:3000
- Open a web browser and navigate to
-
Upload documents
- Use the upload area to select and upload text files
- Check the document list to verify successful uploads
-
Query the index
- Type a natural language question in the query box
- Press Enter to submit the query
- View the AI-generated answer and source references
-
GET
/query?text=<query_text>
- Submit a query to the index
- Returns the answer text and source references
-
POST
/uploadFile
- Upload a document for indexing
- Form data:
file
(document file),filename_as_doc_id
(optional)
-
GET
/getDocuments
- Retrieve the list of indexed documents
.
βββ documents/ # Document storage directory
βββ react_frontend/ # React frontend application
β βββ src/
β β βββ apis/ # API client code
β β βββ components/ # React components
β β βββ ...
β βββ ...
βββ saved_index/ # Persisted vector index storage
βββ flask_demo.py # Flask API server
βββ flask_simple_demo.py # Simplified Flask demo
βββ index_server.py # LlamaIndex processing server
βββ launch_app.sh # Application startup script
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker configuration
-
Start the index server:
python index_server.py
-
Start the Flask API server:
python flask_demo.py
-
Start the React development server:
cd react_frontend npm start
- The application uses a hardcoded password for the BaseManager server
- No user authentication is implemented in this version
- Data is stored locally in the file system