A powerful Document Question-Answering system built using Retrieval-Augmented Generation (RAG) architecture, FAISS Vector Search, and a FastAPI backend. It supports both Bengali and English for querying documents and provides context-aware answers. The system is also integrated with Streamlit for an interactive, user-friendly interface.
Document Question Answering using RAG LLM | FAISS | FastAPI
- 📄 PDF Document Processing
- 🔄 Multiple Text Extraction Methods (pdfplumber, pymupdf, langchain)
- 💾 Vector Database Storage using FAISS
- 🤖 RAG-based Question Answering
- 🌐 FastAPI Backend with Swagger Documentation
- 🖥️ Interactive Streamlit Frontend
- 📊 RAG System Evaluation Tools
- 🔤 Multi-language Support (Bengali & English)
├── app/
│ ├── config/
│ ├── core/
│ ├── data/
│ │ ├── pdfs/
│ │ ├── texts/
│ │ └── vectorstores/
│ ├── processing/
│ ├── routes/
│ ├── schemas/
│ └── services/
├── docs/
├── logs/
├── requirements.txt
├── run.sh
└── streamlit_app.py
- Clone the repository:
git clone https://github.com/FaisalAhmedBijoy/Document-QA-RAG-System-FastAPI.git
cd Document-QA-RAG-System-FastAPI- Create and activate a conda environment:
conda create -n rag_llm python=3.12
conda activate rag_llm- Install dependencies:
pip install -r requirements.txt- Collect Groq API KEY from Groq and paste API KEY in the
.envfile.
GROQ_API_KEY="YOUR_API_KEY"- FastAPI: Web framework for building APIs
- Streamlit: Frontend interface
- LangChain: Framework for developing LLM applications
- FAISS: Vector similarity search
- Pydantic: Data validation
- PDF Processing: pdf2image,
pdfplumber, PyMuPDF
-
Start the FastAPI backend:
python -m app.main
-
Launch the Streamlit frontend:
streamlit run streamlit.py
-
Access the applications:
- FastAPI Swagger UI: http://localhost:8000/docs
- Streamlit Interface: http://localhost:8501
-
Run using Docker:
docker compose -f docker-compose.yaml up --build docker compose -f docker-compose.yaml up
-
PDF Processing
- Convert PDF documents to text using multiple methods
- Store extracted text in the
app/data/textsdirectory
python -m app.processing.pdf_to_text
-
Text Chunking
- Split documents into manageable chunks
- Optimize chunk size for better retrieval
python -m app.processing.generate_text_chunks
-
Embedding Generation and Vector Store
- Generate embeddings for text chunks. Embedding model used
l3cube-pune/bengali-sentence-similarity-sbertfrom HuggingFace. - Store vectors in FAISS index
python -m app.processing.generate_embeddings python -m app.processing.generate_vector_db
- Vector stored into
app/data/vectorstores/faiss_index
- Generate embeddings for text chunks. Embedding model used
-
RAG Chain
- Retrieve relevant context using vector similarity
- Generate answers using LLM with retrieved context.
Lllama-3.3-70b-versatileLLM model used from Groq.python -m app.processing.generate_rag_chain
- prompt used for this task
prompt_template = """ You are an assistant that answers questions strictly based on the provided document text. Rules: - Only use the information from the given Context. - Do not use outside knowledge. - If the answer is not found in the Context, reply exactly: "Information not found in the document." - Provide only the answer, without repeating the question or the context. Context: {context} Question: {question} Answer: """
-
Single Query Inference
- Run a query using LLM model
python -m app.processing.single_query_inference
- Perform question answering about the document
"query": "What is the email address of the candidate?", "answer": "faisal.cse16.kuet@gmail.com"
- Run a query using LLM model
-
API Endpoints
-
/rag/query: Process questions and generate answers- Method:
POST - Params
"query": "What is the name the candidate?",
- Response
"query": "What is the name the candidate?", "answer": "Faisal Ahmed",
- Method:
-
/rag/query-with-reference: Process questions and generate answers- Method:
POST - Params
"query": "What is the job duration in Business Automation Limited?" "expected_answer": "Nov 2024 - Present",
- Response
"query": "What is the job duration in Business Automation Limited?", "expected_answer": "Nov 2024 - Present", "actual": "November, 2024 - Present", "cosine_similarity": 0.8834819187419994, "context": [ "[Doc 1]: • Developed cluster-based remark suggestions in a product • Developed a time series forecasting model for a product registration count. • Data analysis on project cost estimation in the company data and perform EDA. • Developed a website backend service using FastAPI and PostgreSQL that includes SMS and email sending modules, payment gateway integration, and custom PDF generation. Next Solution Lab, Dhaka, Bangladesh February, 2024 - October, 2024 AI Engineer • Developed deep learning-based prod", "[Doc 2]: Sentence Punctuation Restoration [GitHub] A transformer-based Bangla model was used to build the sentence punctuation model. Llama 3.2 was also used to infer with non-punctuation sentence correction. FastAPI was used to prepare the API for deployment with Docker. Tech Stack: BanglaBERT, LLM, Llama 3.2, FastAPI Chat Bot using LLM with Gradio [GitHub] The chatbot is built with Flask for the backend and uses a pre-trained model from Hugging Face for generating responses. Tech Stack: LLM, Gen AI, Fl", ]
- Method:
-
/rag/upload-document-pdf: Upload PDF for generate new FAISS vector- Method: POST
- Params:
PDF File - Response: FAISS vector saved
Document ID.
-
/rag/query-by-document: Process questions and generate answers based on document id- Method:
POST - Params
"query": "What is the name the candidate?", "document_id": "1757268986878",
- Response
"query": "What is the name the candidate?", "answer": "Faisal Ahmed",
- Method:
-
/rag/list-vector-storest: Show the current vector list- Method:
GET - Response
"vectors": { "vector_stores": [ "faiss_index_1757267758132", "faiss_index_1757268986878", ], "document_ids": [ "1757267758132", "1757268986878", ] }
- Method:
-
/rag/pdf/{ddocument_id}: Show the PDF with document id- Method:
GET
- Method:
-
The system includes evaluation tools to measure:
- Answer relevancy
- Response accuracy
- Retrieval quality
- API documentation available at
http://127.0.0.1:8000/docsendpoint - Sample outputs and UI screenshots in
docs/directory - Detailed logging in
logs/directory
- FastAPI
- LangChain
- Streamlit
- FAISS



