🧠 RAG-based PDF QA App

This is a simple Retrieval-Augmented Generation (RAG) application that allows you to upload a PDF, retrieve the most relevant content using semantic similarity, and generate answers using a lightweight LLM. It's built using Sentence Transformers, Qdrant vector store, and a Streamlit UI.

📘 What is RAG?

Retrieval-Augmented Generation (RAG) is an architecture that combines information retrieval and natural language generation. Instead of generating answers purely from a model's training data, RAG retrieves relevant documents from a knowledge base and feeds them into the language model to ground the answer in actual facts.

🧩 What is an Embedding?

An embedding is a numerical representation of data (like text) in a high-dimensional vector space. Similar meanings result in similar vectors. This is crucial for finding semantically relevant documents using distance-based search.

🗃️ What is a Vector Store / Vector Database?

A vector database stores these high-dimensional embeddings and allows for efficient similarity searches using methods like cosine similarity or Euclidean distance. It's the backbone of retrieval in RAG systems.

🛠️ What We Used

Component	Tool/Library
Embedding Model	`all-MiniLM-L6-v2` from `sentence-transformers`
Vector Store	`Qdrant` (in-memory instance)
PDF Parsing	`pdfplumber`
LLM	`HuggingFace Pipeline` (distil model)
UI	`Streamlit`
Language	Python

🖼️ Example Result

Here is an example of how the result looks after querying the PDF:

🚀 How to Run Locally

1. Clone the repository

git clone https://github.com/jinks8010/Simple-RAG
cd Simple-RAG

2. Create and activate a virtual environment

python -m venv rag_env
source rag_env/bin/activate   # On Windows use: rag_env\Scripts\activate

3. Install required dependencies

pip install -r requirements.txt

4. Start the Streamlit app

streamlit run app.py

🚀 Deployed URL

https://huggingface.co/spaces/ajinkya45/SIMPLE-RAG-PDF

📎 Notes

This app only supports PDF uploads.
You can modify the collection or LLM as per your needs.
All vector storage is in-memory using Qdrant, so it resets when restarted

✨ Future Improvements

Support for multi-page PDFs
Add persistent Qdrant backend
Add chat history and follow-up query support

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
.gitignore		.gitignore
README.md		README.md
app.py		app.py
rag_engine.py		rag_engine.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 RAG-based PDF QA App

📘 What is RAG?

🧩 What is an Embedding?

🗃️ What is a Vector Store / Vector Database?

🛠️ What We Used

🖼️ Example Result

🚀 How to Run Locally

1. Clone the repository

2. Create and activate a virtual environment

3. Install required dependencies

4. Start the Streamlit app

🚀 Deployed URL

📎 Notes

✨ Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ajinkya8010/SIMPLE-RAG-APP

Folders and files

Latest commit

History

Repository files navigation

🧠 RAG-based PDF QA App

📘 What is RAG?

🧩 What is an Embedding?

🗃️ What is a Vector Store / Vector Database?

🛠️ What We Used

🖼️ Example Result

🚀 How to Run Locally

1. Clone the repository

2. Create and activate a virtual environment

3. Install required dependencies

4. Start the Streamlit app

🚀 Deployed URL

📎 Notes

✨ Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages