This is a simple Retrieval-Augmented Generation (RAG) application that allows you to upload a PDF, retrieve the most relevant content using semantic similarity, and generate answers using a lightweight LLM. It's built using Sentence Transformers, Qdrant vector store, and a Streamlit UI.
Retrieval-Augmented Generation (RAG) is an architecture that combines information retrieval and natural language generation. Instead of generating answers purely from a model's training data, RAG retrieves relevant documents from a knowledge base and feeds them into the language model to ground the answer in actual facts.
An embedding is a numerical representation of data (like text) in a high-dimensional vector space. Similar meanings result in similar vectors. This is crucial for finding semantically relevant documents using distance-based search.
A vector database stores these high-dimensional embeddings and allows for efficient similarity searches using methods like cosine similarity or Euclidean distance. It's the backbone of retrieval in RAG systems.
Component | Tool/Library |
---|---|
Embedding Model | all-MiniLM-L6-v2 from sentence-transformers |
Vector Store | Qdrant (in-memory instance) |
PDF Parsing | pdfplumber |
LLM | HuggingFace Pipeline (distil model) |
UI | Streamlit |
Language | Python |
Here is an example of how the result looks after querying the PDF:
git clone https://github.com/jinks8010/Simple-RAG
cd Simple-RAG
python -m venv rag_env
source rag_env/bin/activate # On Windows use: rag_env\Scripts\activate
pip install -r requirements.txt
streamlit run app.py
https://huggingface.co/spaces/ajinkya45/SIMPLE-RAG-PDF
- This app only supports PDF uploads.
- You can modify the collection or LLM as per your needs.
- All vector storage is in-memory using Qdrant, so it resets when restarted
- Support for multi-page PDFs
- Add persistent Qdrant backend
- Add chat history and follow-up query support