A Retrieval-Augmented Generation (RAG) system using LLaMA and FAISS, with a modern React frontend.
src/: Backend componentsretriever.py: FAISS-based document retrievergenerator.py: LLaMA-based text generator using llama-cpp-pythonrag_pipeline.py: RAG pipeline implementationapi.py: FastAPI backend server
project/: Frontend React applicationsrc/: React components and logicpublic/: Static assets
data/: Document storagemodels/: Model storagellama-2-7b/: LLaMA model files
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt-
Prepare the LLaMA model:
- Convert your LLaMA model to GGML/GGUF format using llama.cpp
- Place the converted model file (e.g.,
ggml-model.bin) inmodels/llama-2-7b/
-
Run the setup script:
python setup.py- Navigate to the project directory:
cd project- Install dependencies:
npm install-
Ensure you have Docker and Docker Compose installed on your system.
-
Prepare the LLaMA model:
- Convert your LLaMA model to GGML/GGUF format using llama.cpp
- Place the converted model file (e.g.,
ggml-model.bin) inmodels/llama-2-7b/
-
Create necessary directories:
mkdir -p models/llama-2-7b uploads
touch models/llama-2-7b/.gitkeep uploads/.gitkeep- Build and start the containers:
docker-compose up --build- Start the FastAPI server:
python api_main.pyThe API will be available at http://localhost:8000
- In a new terminal, start the Vite development server:
cd project
npm run devThe UI will be available at http://localhost:5173
- Start the application:
docker-compose up- Access the application:
- Frontend:
http://localhost:5173 - Backend API:
http://localhost:8000
- Frontend:
To stop the application:
docker-compose down- Open your browser and navigate to
http://localhost:5173 - Upload a PDF document using the file upload interface
- Once the document is processed, you can start asking questions about its content
- The AI will respond with answers based on the document's content
POST /upload: Upload a PDF document for processingPOST /query: Query the processed documentGET /health: Check API health status
- Modify the retriever's model in
src/retriever.py - Adjust generation parameters in
src/generator.py - Customize the prompt template in
src/rag_pipeline.py - Use a different model by updating the
model_pathinsrc/generator.py
- Customize the UI components in
project/src/components/ - Modify the API integration in
project/src/App.tsx - Update styles in
project/src/index.css
To convert your LLaMA model to GGML/GGUF format:
- Clone llama.cpp:
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp- Convert your model:
python convert.py --outfile models/llama-2-7b/ggml-model.bin --outtype f16 /path/to/your/llama/modelFor more details, refer to the llama.cpp documentation.