This repository hosts an intelligent and scalable Question-Answering (QA) chatbot system that combines the strengths of transformer-based models and retrieval-augmented generation (RAG) to provide context-aware, accurate responses to natural language queries.
Designed for real-world applications in customer service, knowledge bases, education, and enterprise automation, this project seamlessly integrates deep learning with vector search to achieve high performance and reliability.
π§© The Problem: Pretrained AI Isnβt Always Enough Pretrained language models like BERT and GPT are powerful, but they have a limitation:
They only "know" what they were trained on β and that training data has a cutoff.
In real-world use cases like customer support, academic help, or domain-specific knowledge (e.g., healthcare, legal, enterprise), answers must be accurate, up-to-date, and based on external documents. Out-of-the-box models can hallucinate, give vague answers, or ignore context completely.
To solve this, I built a Retrieval-Augmented Generation (RAG)-based Question Answering System from scratch. Here's how I approached the solution:
I started with BERT (a pre-trained transformer) and fine-tuned it on a custom Q&A dataset. This helped the model:
Understand my domain better (not just general English)
Improve the precision of answer extraction from context
I used sentence-transformers to turn every document/context into a vector β a mathematical representation of its meaning. Then I stored those in a FAISS index, so when a user asks a question, the system:
Finds the top matching contexts
Sends them to the QA model as supporting evidence
To make it usable, I wrapped everything in a FastAPI backend. Now users can:
Ask questions via API or frontend
Get fast, accurate answers in real-time
To make it lightweight and production-ready:
I applied dynamic quantization to reduce the model size and speed up inference
Ensured it supports scalable batch processing
This wasnβt just plug-and-play β I actually engineered each layer:
| Component | What I Did |
|---|---|
| Fine-tuning | Curated dataset, encoded QA pairs, trained and saved BERT-based QA model |
| Semantic Search | Used LangChain + FAISS to create and load a fast vector database |
| Backend Inference Logic | Wrote custom logic to connect search β context β BERT answer |
| Optimization | Quantized model for lower latency and smaller memory footprint |
| API Integration | Built FastAPI app with endpoints for QA + optional Streamlit interface |
| Documentation & ReadMe | Structured the project, wrote clear instructions, and explained architecture |
"You ask a question β the system finds relevant documents β the AI model reads them β it gives you a precise answer β just like a smart assistant who knows where to look."
β
Fine-tuned BERT for domain-specific QA
β
Retrieval-Augmented Generation for grounded answers
β
FAISS-based fast semantic context retrieval
β
API-powered and optionally UI-enabled deployment
β
Quantization-enabled model for faster inference
β
Modular, well-documented codebase
+------------------------+
| User Asks Question |
+-----------+------------+
|
v
+------------------+-------------------+
| Semantic Search (FAISS + Embeddings) |
+------------------+-------------------+
|
Retrieved Context
|
v
+------------------+-------------------+
| BERT QA Model (Fine-tuned) |
+------------------+-------------------+
|
Generated Answer
|
v
+------------------------+
| Delivered to User |
+------------------------+
This pipeline ensures both speed and accuracy by blending retrieval-based context with transformer-powered reasoning.
| Tool/Library | Role |
|---|---|
transformers |
Pre-trained BERT model + QA fine-tuning |
sentence-transformers |
High-quality sentence embeddings |
faiss |
Approximate nearest neighbor search |
langchain |
Vector store management and retrieval |
pytorch |
Training and inference |
FastAPI |
Real-time backend API |
Streamlit |
Optional web interface |
torch.quantization |
Inference acceleration |
GenAI_QA_System/
βββ data/
β βββ qa_dataset.json # Custom Q&A training data
β
βββ model/
β βββ fine_tune.py # Fine-tune BERT script
| βββ run_inference.py # Main RAG-based inference
β
βββ api/
β βββ app.py # FastAPI entrypoint
βββ optimized_inference.py # Faster inference with quantized model
|ββ create_faiss_index.py # Build FAISS vector store
βββ requirements.txt # All dependencies
βββ README.md # Youβre reading it!
git clone https://github.com/ansarimzp/GenAI_QA_System.git
cd GenAI_QA_System
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt# Fine-tune the model on your custom dataset
python src/fine_tune.py
# Build the FAISS index
python src/create_faiss_index.pyuvicorn app.main:app --reloadAccess the API at http://localhost:8000
{
"question": "What does BERT stand for?"
}Response:
{
"answer": "BERT stands for Bidirectional Encoder Representations from Transformers.",
"confidence": 0.94
}streamlit run app/ui.pyNavigate to http://localhost:8501 in your browser and start asking questions with a simple interface.
- β
Model Quantization using
torch.quantizationto reduce memory and boost speed - β‘ FAISS-based Retrieval for sub-second document search
- β»οΈ Context Caching and batched tokenization to reduce compute time
You can fine-tune the system to your domain:
- Prepare a JSON dataset:
[
{
"context": "Tesla was founded in 2003 by engineers...",
"question": "When was Tesla founded?",
"answer": "2003"
}
]- Fine-tune:
python src/fine_tune.py --data_path ./data/your_dataset.json- Re-index:
python src/create_faiss_index.py --data_path ./data/your_dataset.json- βοΈ Support for multilingual QA
- π Conversational memory with history
- π Real-time feedback loop for answer quality
- π³ Docker-based deployment for portability
- π€ Integration with vector-capable databases (e.g., Weaviate, Pinecone)
This project is licensed under the MIT License. See the LICENSE file for details.
- π€ Hugging Face for providing transformer models
- π§ LangChain for chaining logic and vector store wrappers
- π Facebook Research for FAISS
- π The open-source community for continued innovation
Ready to build your own domain-specific AI Q&A engine?
Fork this project, fine-tune with your knowledge base, and deploy it in minutes.
---
Let me know if you'd like me to generate this as a downloadable `README.md` file or help create the `ui.py` Streamlit interface.