RAG QA Fullstack

A full-stack Question-Answering (QA) system that uses Retrieval-Augmented Generation (RAG) to provide accurate, contextually relevant answers from unstructured documents.

Problem Interpretation
Proposed Solution & Rationale
- Chosen Technical Approach
- RAG Pipeline Summary
Tools/Libraries
- Backend
- Frontend
Prerequisites
Quickstart
Setup Instructions
- Backend Setup
- Frontend Setup
Troubleshooting
- Dependency-Environment Version Errors
- SSL Certificate Errors
Questions to Ask the QA System
- Working Questions
- Known Errors Due to Wording
Future Improvements

Problem Interpretation

Many organizations struggle to efficiently retrieve precise answers from large volumes of unstructured textual data. Traditional keyword-based search methods often return irrelevant results, requiring significant manual effort to sift through them. The goal of this Proof of Concept (PoC) is to develop a Question-Answering (QA) system that uses Retrieval-Augmented Generation (RAG) to provide accurate and contextually relevant answers by leveraging a combination of document embeddings and machine learning models.

Proposed Solution & Rationale

Chosen Technical Approach

This project implements a Retrieval-Augmented Generation (RAG) pipeline to answer user queries by combining document retrieval and language model-based answer generation:

Document Embedding and Retrieval:
- Documents are loaded from a directory and converted into vector embeddings using the all-MiniLM-L6-v2 model from Sentence Transformers.
Answer Generation:
- The top relevant documents are processed to extract the most relevant sentences, which are combined into a context.
- The context is then truncated to fit within a token limit (max_tokens), ensuring compatibility with the language model.
- The context and query are passed to the google/flan-t5-large model from Hugging Face Transformers to generate a detailed and contextually relevant answer.
Confidence and Source Tracking:
- Confidence scores for documents and sentences are computed through:
  1. Cosine Similarity: used to compare the query embedding with document and sentence embeddings, ensures efficient and meaningful retrieval of information
  2. Keyword Boost: applied to prioritize documents containing query-specific keywords.
- Retrieve Relevant Documents: extract the highest-scoring documents from the data
- Retrieve Relevant Sentences to Create the Context: extract the highest-scoring sentences from top documents to create the context for the llm
- Document scores, document names (sources) and the generated answer are provided as a result
Frontend-Backend Integration:
- A FastAPI backend handles document embedding, retrieval, and query processing.
- A React frontend allows users to input questions and view answers in an intuitive interface.

RAG Pipeline Summary

Data Loading: Prepare text documents for processing
Data Indexing*: Store document embeddings into a vector database (ie. FAISS or Pinecone) for efficient retrieval
Generate Embeddings: Documents are converted into vector embeddings
Retrieve Relevant Information: Retrieve top-scoring sources (relevant sentences are extracted from top-scoring documents) to provide context for the LLM
Augment LLM Prompt: Prompt engineering techniques are utilized to effectively communicate with the LLM in order to generate an accurate answer
Update External Data*: Maintain current information for retrieval, asynchronously update the documents and update embedding representation of the documents

*not implemented in this simplified version

Why This Approach?

This approach leverages semantic understanding via Sentence Transformers and high-quality, hallucination-free answer generation using Flan-T5, an open-source LLM published by Google. Its modular architecture prioritises scalability, while also ensuring transparent results by listing sources.

Trade-offs/Limitations

Embedding Model Size: The all-MiniLM-L6-v2 model is fast but less accurate than larger models.
Answer Generation Quality: Generated answers depends on the relevance of the retrieved documents and the google/flan-t5-large model's capabilities.
In-Memory Storage: Embeddings are stored in memory, limiting scalability for large datasets; a vector database ike FAISS or Pinecone could resolve this.
Response Time: Retrieval and model processing speed affect the pipeline's response time.
Context Truncation: Context is truncated to fit token limits, which may omit relevant details.
Document Format Support: Only plain text files are supported; other formats like PDFs require preprocessing.

Tools/Libraries

Backend:

FastAPI: For building a lightweight, high-performance backend API.
Pydantic: Used as part of FastAPI for data validation and serialization.
Uvicorn: ASGI server for running the FastAPI backend.
Sentence Transformers: For generating high-quality document embeddings.
Hugging Face Transformers: For leveraging the google/flan-t5-large model to generate answers.
Python: Core programming language for backend development.

Frontend:

Node.js: For managing frontend dependencies and running the development server.
React (Vite/TypeScript): For building a fast, interactive and type-safe frontend.
Material-UI (MUI): For creating a consistent and responsive frontend design.
npm: For managing JavaScript packages and scripts.

Prerequisites

Before setting up the project, ensure you have the following installed:

Python: Install pyenv to manage Python versions. Follow the pyenv installation guide.
Node.js: Required for the frontend. Install it from Node.js official website.
Git: For cloning the repository. Install it from Git.

Quickstart

For users who want to get started quickly, follow these minimal steps:

Clone the Repository:

git clone https://github.com/JadeChan03/rag-qa-fullstack.git
cd rag-qa-fullstack

Start the Backend:

cd server
pyenv install 3.10.0  # Only if not already installed
pyenv local 3.10.0 # ensures compatibility with dependencies
# note: refer to "Backend Setup: Environment Configurations" if version errors occur 
python3 -m venv venv
source venv/bin/activate  # use `venv\Scripts\activate` on Windows
pip install -r requirements.txt
uvicorn app:app --reload # starts server

Start the Frontend: Open a separate terminal:
```
cd client
npm install
npm run dev
```
Access the Application:
- Backend: Visit http://127.0.0.1:8000/docs for the Swagger UI.
- Frontend: Open http://localhost:5173 in your browser or press o + enter.
Ask a Question:
- Type your question into the input field in the frontend and click "Submit."
- The system will process the query and retrieve an answer from the documents.

For a detailed setup, refer to the following section (Setup Instructions).

Setup Instructions

Clone the Repository

git clone https://github.com/JadeChan03/rag-qa-fullstack.git

Running The Application

Open two terminal sessions:
- Terminal 1: For running the backend server.
- Terminal 2: For running the frontend client.
Follow the setup instructions for both the backend and frontend in their respective terminal sessions.

Backend Setup

Navigate to the Server Directory
```
cd rag-qa-fullstack/server
```
Environment Configurations
1. Install Python 3.10.0 (if not already installed):
```
pyenv install 3.10.0
```
2. In the server directory, set the local Python version:
```
pyenv local 3.10.0
```
3. Verify the Python version pyenv is using:
```
pyenv version
```
  Ensure the output matches the required version (e.g., 3.10.0).
4. Verify the Python version python3 is using:
```
which python3 # expected to point to a pyenv shim
python3 --version # expected `Python 3.10.0`
```
  If which python3 does not point to a pyenv shim (e.g., ~/.pyenv/shims/python3), continue with the following steps to correct this issue:
5. Edit your Shell Configuration (e.g., ~/.zshrc or ~/.bashrc)
```
nano ~/.zshrc # use a text editor to open your shell
```
1. Add the Following to your Shell Configuration File
```
export PATH="$(pyenv root)/shims:$PATH"
```
1. Reload Shell Configuration
```
source ~/.zshrc
```
Additional Troubleshooting
- If you encounter SSL errors during Python installation via pyenv, follow the steps in the (Troubleshooting: SSL Certificate Errors).
Create and Activate the Virtual Environment
To isolate dependencies and avoid conflicts, create a virtual environment.

On macOS/Linux:
```
python3 -m venv venv
source venv/bin/activate
```
On Windows:
```
python -m venv venv
`\venv\Scripts\activate
```
Install Dependencies
Once the virtual environment is activated, install the required dependencies:
```
pip install -r requirements.txt
```
Select the Correct Python Interpreter in Visual Studio Code
To avoid dependency import errors, ensure that VS Code is using the correct Python interpreter:
- Press Cmd+Shift+P (macOS) or Ctrl+Shift+P (Windows/Linux).
- Search for and select Python: Select Interpreter.
- Choose the interpreter located in the venv folder. For example, server/venv/bin/python or server/venv/Scripts/python.exe.
Start the Backend Server
```
uvicorn app:app --reload
```
Access the backend at http://127.0.0.1:8000 or http://127.0.0.1:8000/docs.

Frontend Setup

Navigate to the Client Directory:
```
cd ../client
```
Install Dependencies:
```
npm install
```
Start the Frontend Development Server:
```
npm run dev
```
Access the frontend at http://localhost:5173 or press o + enter.

Troubleshooting

Dependency-Environment Version Errors

If you are installing requirements.txt and encounter an error like:

ERROR: Could not find a version that satisfies the requirement torch<2.0.0,>=1.11.0 (from versions: 2.6.0, 2.7.0)

The error indicates that the specified version range for the dependency (in this case, torch >=1.11.0,<2.0.0) is not available for your current Python environment or platform. This could be due to two reasons:

The local environment was set (pyenv local 3.10.0) after the virtual environment was created. You must delete your virtual environment and create a new one in the correct environment. However, if Python 3.10.0 was set with pyenv prior to creating the virtual environment. The issue likely arised because
the Python version managed by pyenv (3.10.0) is not properly linked to the python3 command when creating the virtual environment. Even though you set pyenv local 3.10.0, the python3 command might still point to a system-installed Python version (e.g., 3.13.0) instead of the pyenv-managed version. To fix this, refer to steps 5-7 in (Environment Configuration.

SSL Certificate Errors

If you encounter SSL errors during Python installation via pyenv, follow these steps prior to reactivating your python version (pyenv local 3.10.0) and virtual environment (source venv/bin/activate):

For macOS (Homebrew):

Install SSL Dependencies:
```
brew install openssl readline xz zlib
```

Reinstall Python 3.10.0 with SSL Support:

export LDFLAGS="-L$(brew --prefix openssl)/lib"
export CPPFLAGS="-I$(brew --prefix openssl)/include"
export PKG_CONFIG_PATH="$(brew --prefix openssl)/lib/pkgconfig"

env PYTHON_CONFIGURE_OPTS="--enable-shared --with-openssl=$(brew --prefix openssl)" pyenv install 3.10.0

Verify SSL Installation:
Test that Python has SSL support by running:
```
python3
>>> import ssl
>>> print(ssl.OPENSSL_VERSION)
```

For Windows:

Install SSL Dependencies:
Download and install the necessary SSL libraries, such as OpenSSL, from https://slproweb.com/products/Win32OpenSSL.html. Ensure that the correct version for your system architecture (32-bit or 64-bit) is installed.
Set Environment Variables:
Configure the paths to the OpenSSL libraries in your environment variables:
- Add the bin directory of OpenSSL (e.g., C:\Program Files\OpenSSL-Win64\bin) to your PATH.
- Set the INCLUDE and LIB environment variables to point to the OpenSSL installation directories.
Reinstall Python with OpenSSL Support:
Use the following commands to reinstall Python via pyenv-win with OpenSSL:
```
set "PYTHON_CONFIGURE_OPTS=--with-openssl=C:\Program Files\OpenSSL-Win64"
pyenv install 3.10.0
```
Verify SSL Installation:
Test that Python has SSL support by running:
```
python
>>> import ssl
>>> print(ssl.OPENSSL_VERSION)
```

General Steps for All Platforms

Ensure pyenv is Properly Set Up:

Make sure pyenv (or pyenv-win for Windows) is initialized in your shell configuration file (e.g., .bashrc or .zshrc for macOS/Linux, or PowerShell for Windows).

Update Tools:

Keep your tools up to date to avoid compatibility issues:

# macOS/Linux
brew update && brew upgrade
pyenv update

# Windows (PowerShell)
pyenv update

Managing Global Python Dependencies:

If global Python dependencies are causing conflicts or clutter, consider cleaning them up:
1. View all globally installed Python dependencies:
```
pip list
```
1. Uninstall All Python Packages
```
pip freeze | xargs pip uninstall -y
```
- WARNING: This will remove all global packages, which may affect other projects or system tools.
- These commands apply only to the current Python environment (global or virtual) and do not affect pyenv

Recreate Virtual Environment:

Always create and activate a virtual environment for each project to isolate dependencies. If an existing virtual environment was created with a Python version lacking SSL support or compatible dependencies, delete and recreate it:
```
rm -rf venv
python -m venv venv
source venv/bin/activate   # Use `venv\Scripts\activate` on Windows
pip install -r requirements.txt
```

Questions to Ask the QA System

Working Questions Based on Sample Client Documents

Document: HR_Remote_Work_Policy.txt

"Who is eligible for remote work under the updated policy?"
"What are the standard work hours for remote employees, and are flexible arrangements allowed?"

Document: Internal_Announcement_Q3Goals.txt

"When is Project Phoenix (Customer Portal Upgrade) scheduled to launch?"
"What is the deadline for completing cybersecurity training for all staff?"

Document: Product_Spec.txt

"What new features are introduced in Widget Alpha v2.1?"
"What is the average processing latency and throughput capacity of Widget Alpha v2.1?"

Document: Project_Summary.txt

"What is the primary goal of Project Phoenix?"
"What are the identified risks that could impact the completion of Project Phoenix?"

Known Errors Due to Wording

Some questions provide inaccurate or incomplete response due to wording. To improve accuracy, simplifying questions and/or rewording them into direct requests may help.

"Who is working on Project Phoenix?"
- Answer: John Smith * Tech Lead: Alice Green (Incomplete)
"Name everyone on Project Phoenix?"
- Answer: Jane Doe (VP, Customer Success) (Incomplete)
"Name everyone on Project phoenix" ('Phoenix' is lower case)
- Answer: "Jane Doe (VP, Customer Success) _ Project Manager: John Smith _ Tech Lead: Alice Green * Primary Users: Customer Support Team, End Customers" (Better)
"What is the remote work policy?"
- Answer: All remote work must comply with the company's data security and confidentiality policies (Incomplete)
"Summarise the remote work policy"
- Answer: 1. Security: All remote work must comply with the company's data security and confidentiality policies. Eligibility:** Full-time employees with manager approval and a role suitable for remote work are eligible. Communication:** Remote employees are expected to be reachable via company-approved communication channels (Slack, Email, Video Conferencing) during work hours. (Better)
"What is the deadline for completing mandatory cybersecurity training for all staff?"
- Answer: August 1, 2024 (Incorrect)
"What is the deadline for completing cybersecurity training for all staff?"
- Answer: September 30th (Correct but incomplete)
"What are the key initiatives planned to enhance customer retention in Q3 2025?"
- Answer: Let's work together to make Q3 a successful quarter (Irrelevant)
"How will customer retention be improved?"
- Answer: Reduce average support ticket resolution time by 15% (Better but incomplete)

Future Improvements

Improvements are focused on the RAG system rather than the application as a whole, with enhancements focused on the development of the actual product rather than the PoC.

Scalability:
- Implement a vector database like Pinecone, Weaviate, or FAISS to handle larger datasets and improve retrieval efficiency.
Support for More Document Types:
- Add preprocessing pipelines to handle formats like PDFs, Word documents, and spreadsheets.
Improved Answer Generation:
- Fine-tune the language model for domain-specific queries and improve context handling to reduce truncation issues.
Optimized Response Time:
- Implement caching, asynchronous processing, and batching of LLM requests to reduce latency.
Embedding Model Upgrades:
- Explore larger or more accurate embedding models to improve retrieval precision without compromising performance (a lightweight model was chosen largely due to the fact that this is a PoC).
Enhance Prompt Transparency:
- Explicitly provide the segment that prooves that generated answer, and a link to the full source document(s).
Leverage LangChain for Pipeline Simplification:
- Integrate the LangChain framework to streamline the handling of document loading, embedding generation, vector database integration, and LLM interaction.
- Use LangChain's modular components to simplify the addition of new features, such as prompt templates, memory for chatbots, and tool chaining.
- This would reduce development overhead and improve maintainability as the system scales.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
client		client
server		server
.DS_Store		.DS_Store
README.md		README.md

JadeChan03/rag-qa-fullstack

Folders and files

Latest commit

History

Repository files navigation