A full-stack Question-Answering (QA) system that uses Retrieval-Augmented Generation (RAG) to provide accurate, contextually relevant answers from unstructured documents.
- Problem Interpretation
- Proposed Solution & Rationale
- Tools/Libraries
- Prerequisites
- Quickstart
- Setup Instructions
- Troubleshooting
- Questions to Ask the QA System
- Future Improvements
Many organizations struggle to efficiently retrieve precise answers from large volumes of unstructured textual data. Traditional keyword-based search methods often return irrelevant results, requiring significant manual effort to sift through them. The goal of this Proof of Concept (PoC) is to develop a Question-Answering (QA) system that uses Retrieval-Augmented Generation (RAG) to provide accurate and contextually relevant answers by leveraging a combination of document embeddings and machine learning models.
This project implements a Retrieval-Augmented Generation (RAG) pipeline to answer user queries by combining document retrieval and language model-based answer generation:
- Document Embedding and Retrieval:
- Documents are loaded from a directory and converted into vector embeddings using the
all-MiniLM-L6-v2
model from Sentence Transformers.
- Documents are loaded from a directory and converted into vector embeddings using the
- Answer Generation:
- The top relevant documents are processed to extract the most relevant sentences, which are combined into a context.
- The context is then truncated to fit within a token limit (
max_tokens
), ensuring compatibility with the language model. - The context and query are passed to the
google/flan-t5-large
model from Hugging Face Transformers to generate a detailed and contextually relevant answer.
- Confidence and Source Tracking:
- Confidence scores for documents and sentences are computed through:
- Cosine Similarity: used to compare the query embedding with document and sentence embeddings, ensures efficient and meaningful retrieval of information
- Keyword Boost: applied to prioritize documents containing query-specific keywords.
- Retrieve Relevant Documents: extract the highest-scoring documents from the data
- Retrieve Relevant Sentences to Create the Context: extract the highest-scoring sentences from top documents to create the context for the llm
- Document scores, document names (sources) and the generated answer are provided as a result
- Confidence scores for documents and sentences are computed through:
- Frontend-Backend Integration:
- A FastAPI backend handles document embedding, retrieval, and query processing.
- A React frontend allows users to input questions and view answers in an intuitive interface.
- Data Loading: Prepare text documents for processing
- Data Indexing*: Store document embeddings into a vector database (ie. FAISS or Pinecone) for efficient retrieval
- Generate Embeddings: Documents are converted into vector embeddings
- Retrieve Relevant Information: Retrieve top-scoring sources (relevant sentences are extracted from top-scoring documents) to provide context for the LLM
- Augment LLM Prompt: Prompt engineering techniques are utilized to effectively communicate with the LLM in order to generate an accurate answer
- Update External Data*: Maintain current information for retrieval, asynchronously update the documents and update embedding representation of the documents
*not implemented in this simplified version
This approach leverages semantic understanding via Sentence Transformers and high-quality, hallucination-free answer generation using Flan-T5, an open-source LLM published by Google. Its modular architecture prioritises scalability, while also ensuring transparent results by listing sources.
- Embedding Model Size: The
all-MiniLM-L6-v2
model is fast but less accurate than larger models. - Answer Generation Quality: Generated answers depends on the relevance of the retrieved documents and the
google/flan-t5-large
model's capabilities. - In-Memory Storage: Embeddings are stored in memory, limiting scalability for large datasets; a vector database ike FAISS or Pinecone could resolve this.
- Response Time: Retrieval and model processing speed affect the pipeline's response time.
- Context Truncation: Context is truncated to fit token limits, which may omit relevant details.
- Document Format Support: Only plain text files are supported; other formats like PDFs require preprocessing.
Backend:
- FastAPI: For building a lightweight, high-performance backend API.
- Pydantic: Used as part of FastAPI for data validation and serialization.
- Uvicorn: ASGI server for running the FastAPI backend.
- Sentence Transformers: For generating high-quality document embeddings.
- Hugging Face Transformers: For leveraging the
google/flan-t5-large
model to generate answers. - Python: Core programming language for backend development.
Frontend:
- Node.js: For managing frontend dependencies and running the development server.
- React (Vite/TypeScript): For building a fast, interactive and type-safe frontend.
- Material-UI (MUI): For creating a consistent and responsive frontend design.
- npm: For managing JavaScript packages and scripts.
Before setting up the project, ensure you have the following installed:
- Python: Install
pyenv
to manage Python versions. Follow the pyenv installation guide. - Node.js: Required for the frontend. Install it from Node.js official website.
- Git: For cloning the repository. Install it from Git.
For users who want to get started quickly, follow these minimal steps:
-
Clone the Repository:
git clone https://github.com/JadeChan03/rag-qa-fullstack.git cd rag-qa-fullstack
-
Start the Backend:
cd server pyenv install 3.10.0 # Only if not already installed pyenv local 3.10.0 # ensures compatibility with dependencies # note: refer to "Backend Setup: Environment Configurations" if version errors occur python3 -m venv venv source venv/bin/activate # use `venv\Scripts\activate` on Windows pip install -r requirements.txt uvicorn app:app --reload # starts server
-
Start the Frontend: Open a separate terminal:
cd client npm install npm run dev
-
Access the Application:
- Backend: Visit
http://127.0.0.1:8000/docs
for the Swagger UI. - Frontend: Open
http://localhost:5173
in your browser or presso + enter
.
- Backend: Visit
-
Ask a Question:
- Type your question into the input field in the frontend and click "Submit."
- The system will process the query and retrieve an answer from the documents.
For a detailed setup, refer to the following section (Setup Instructions).
git clone https://github.com/JadeChan03/rag-qa-fullstack.git
-
Open two terminal sessions:
- Terminal 1: For running the backend server.
- Terminal 2: For running the frontend client.
-
Follow the setup instructions for both the backend and frontend in their respective terminal sessions.
-
Navigate to the Server Directory
cd rag-qa-fullstack/server
-
Environment Configurations
-
Install Python 3.10.0 (if not already installed):
pyenv install 3.10.0
-
In the
server
directory, set the local Python version:pyenv local 3.10.0
-
Verify the Python version
pyenv
is using:pyenv version
Ensure the output matches the required version (e.g.,
3.10.0
). -
Verify the Python version
python3
is using:which python3 # expected to point to a pyenv shim python3 --version # expected `Python 3.10.0`
If
which python3
does not point to a pyenv shim (e.g., ~/.pyenv/shims/python3), continue with the following steps to correct this issue: -
Edit your Shell Configuration (e.g.,
~/.zshrc
or~/.bashrc
)
nano ~/.zshrc # use a text editor to open your shell
- Add the Following to your Shell Configuration File
export PATH="$(pyenv root)/shims:$PATH"
- Reload Shell Configuration
source ~/.zshrc
Additional Troubleshooting
- If you encounter SSL errors during Python installation via
pyenv
, follow the steps in the (Troubleshooting: SSL Certificate Errors).
-
-
Create and Activate the Virtual Environment
To isolate dependencies and avoid conflicts, create a virtual environment.On macOS/Linux:
python3 -m venv venv source venv/bin/activate
On Windows:
python -m venv venv `\venv\Scripts\activate
-
Install Dependencies
Once the virtual environment is activated, install the required dependencies:pip install -r requirements.txt
-
Select the Correct Python Interpreter in Visual Studio Code
To avoid dependency import errors, ensure that VS Code is using the correct Python interpreter:- Press
Cmd+Shift+P
(macOS) orCtrl+Shift+P
(Windows/Linux). - Search for and select Python: Select Interpreter.
- Choose the interpreter located in the
venv
folder. For example,server/venv/bin/python
orserver/venv/Scripts/python.exe
.
- Press
-
Start the Backend Server
uvicorn app:app --reload
Access the backend at
http://127.0.0.1:8000
orhttp://127.0.0.1:8000/docs
.
-
Navigate to the Client Directory:
cd ../client
-
Install Dependencies:
npm install
-
Start the Frontend Development Server:
npm run dev
Access the frontend at
http://localhost:5173
or presso + enter
.
If you are installing requirements.txt
and encounter an error like:
ERROR: Could not find a version that satisfies the requirement torch<2.0.0,>=1.11.0 (from versions: 2.6.0, 2.7.0)
The error indicates that the specified version range for the dependency (in this case, torch >=1.11.0,<2.0.0) is not available for your current Python environment or platform. This could be due to two reasons:
- The local environment was set (
pyenv local 3.10.0
) after the virtual environment was created. You must delete your virtual environment and create a new one in the correct environment. However, if Python 3.10.0 was set withpyenv
prior to creating the virtual environment. The issue likely arised because - the Python version managed by
pyenv
(3.10.0) is not properly linked to thepython3
command when creating the virtual environment. Even though you set pyenv local 3.10.0, the python3 command might still point to a system-installed Python version (e.g., 3.13.0) instead of the pyenv-managed version. To fix this, refer to steps 5-7 in (Environment Configuration.
If you encounter SSL errors during Python installation via pyenv
, follow these steps prior to reactivating your python version (pyenv local 3.10.0
) and virtual environment (source venv/bin/activate
):
-
Install SSL Dependencies:
brew install openssl readline xz zlib
-
Reinstall Python 3.10.0 with SSL Support:
export LDFLAGS="-L$(brew --prefix openssl)/lib" export CPPFLAGS="-I$(brew --prefix openssl)/include" export PKG_CONFIG_PATH="$(brew --prefix openssl)/lib/pkgconfig" env PYTHON_CONFIGURE_OPTS="--enable-shared --with-openssl=$(brew --prefix openssl)" pyenv install 3.10.0
-
Verify SSL Installation:
Test that Python has SSL support by running:python3 >>> import ssl >>> print(ssl.OPENSSL_VERSION)
-
Install SSL Dependencies:
Download and install the necessary SSL libraries, such as OpenSSL, from https://slproweb.com/products/Win32OpenSSL.html. Ensure that the correct version for your system architecture (32-bit or 64-bit) is installed. -
Set Environment Variables:
Configure the paths to the OpenSSL libraries in your environment variables:- Add the
bin
directory of OpenSSL (e.g.,C:\Program Files\OpenSSL-Win64\bin
) to yourPATH
. - Set the
INCLUDE
andLIB
environment variables to point to the OpenSSL installation directories.
- Add the
-
Reinstall Python with OpenSSL Support:
Use the following commands to reinstall Python viapyenv-win
with OpenSSL:set "PYTHON_CONFIGURE_OPTS=--with-openssl=C:\Program Files\OpenSSL-Win64" pyenv install 3.10.0
-
Verify SSL Installation:
Test that Python has SSL support by running:python >>> import ssl >>> print(ssl.OPENSSL_VERSION)
- Make sure pyenv (or pyenv-win for Windows) is initialized in your shell configuration file (e.g.,
.bashrc
or.zshrc
for macOS/Linux, orPowerShell
for Windows).
-
Keep your tools up to date to avoid compatibility issues:
# macOS/Linux brew update && brew upgrade pyenv update # Windows (PowerShell) pyenv update
-
If global Python dependencies are causing conflicts or clutter, consider cleaning them up:
- View all globally installed Python dependencies:
pip list
- Uninstall All Python Packages
pip freeze | xargs pip uninstall -y
- WARNING: This will remove all global packages, which may affect other projects or system tools.
- These commands apply only to the current Python environment (global or virtual) and do not affect
pyenv
-
Always create and activate a virtual environment for each project to isolate dependencies. If an existing virtual environment was created with a Python version lacking SSL support or compatible dependencies, delete and recreate it:
rm -rf venv python -m venv venv source venv/bin/activate # Use `venv\Scripts\activate` on Windows pip install -r requirements.txt
Document: HR_Remote_Work_Policy.txt
- "Who is eligible for remote work under the updated policy?"
- "What are the standard work hours for remote employees, and are flexible arrangements allowed?"
Document: Internal_Announcement_Q3Goals.txt
- "When is Project Phoenix (Customer Portal Upgrade) scheduled to launch?"
- "What is the deadline for completing cybersecurity training for all staff?"
Document: Product_Spec.txt
- "What new features are introduced in Widget Alpha v2.1?"
- "What is the average processing latency and throughput capacity of Widget Alpha v2.1?"
Document: Project_Summary.txt
- "What is the primary goal of Project Phoenix?"
- "What are the identified risks that could impact the completion of Project Phoenix?"
Some questions provide inaccurate or incomplete response due to wording. To improve accuracy, simplifying questions and/or rewording them into direct requests may help.
-
"Who is working on Project Phoenix?"
- Answer: John Smith * Tech Lead: Alice Green (Incomplete)
-
"Name everyone on Project Phoenix?"
- Answer: Jane Doe (VP, Customer Success) (Incomplete)
-
"Name everyone on Project phoenix" ('Phoenix' is lower case)
- Answer: "Jane Doe (VP, Customer Success) _ Project Manager: John Smith _ Tech Lead: Alice Green * Primary Users: Customer Support Team, End Customers" (Better)
-
"What is the remote work policy?"
- Answer: All remote work must comply with the company's data security and confidentiality policies (Incomplete)
-
"Summarise the remote work policy"
- Answer: 1. Security: All remote work must comply with the company's data security and confidentiality policies. Eligibility:** Full-time employees with manager approval and a role suitable for remote work are eligible. Communication:** Remote employees are expected to be reachable via company-approved communication channels (Slack, Email, Video Conferencing) during work hours. (Better)
-
"What is the deadline for completing mandatory cybersecurity training for all staff?"
- Answer: August 1, 2024 (Incorrect)
-
"What is the deadline for completing cybersecurity training for all staff?"
- Answer: September 30th (Correct but incomplete)
-
"What are the key initiatives planned to enhance customer retention in Q3 2025?"
- Answer: Let's work together to make Q3 a successful quarter (Irrelevant)
-
"How will customer retention be improved?"
- Answer: Reduce average support ticket resolution time by 15% (Better but incomplete)
Improvements are focused on the RAG system rather than the application as a whole, with enhancements focused on the development of the actual product rather than the PoC.
- Scalability:
- Implement a vector database like Pinecone, Weaviate, or FAISS to handle larger datasets and improve retrieval efficiency.
- Support for More Document Types:
- Add preprocessing pipelines to handle formats like PDFs, Word documents, and spreadsheets.
- Improved Answer Generation:
- Fine-tune the language model for domain-specific queries and improve context handling to reduce truncation issues.
- Optimized Response Time:
- Implement caching, asynchronous processing, and batching of LLM requests to reduce latency.
- Embedding Model Upgrades:
- Explore larger or more accurate embedding models to improve retrieval precision without compromising performance (a lightweight model was chosen largely due to the fact that this is a PoC).
- Enhance Prompt Transparency:
- Explicitly provide the segment that prooves that generated answer, and a link to the full source document(s).
- Leverage LangChain for Pipeline Simplification:
- Integrate the LangChain framework to streamline the handling of document loading, embedding generation, vector database integration, and LLM interaction.
- Use LangChain's modular components to simplify the addition of new features, such as prompt templates, memory for chatbots, and tool chaining.
- This would reduce development overhead and improve maintainability as the system scales.