Flask • LangChain • Pinecone • HuggingFace Embeddings • Groq (Llama 3) • SerpAPI fallback • Deployed on AWS EC2
Live app: Open MediBot (Live)
A retrieval-augmented medical Q&A chatbot. It embeds expert-authored medical documents, stores them in a Pinecone vector index, retrieves the most relevant chunks for a user query, and generates grounded answers with a Groq-hosted LLM via LangChain. A minimal Flask app provides a real-time chat UI and a simple API. If no trustworthy match is found, SerpAPI fallback performs a web search and provides an answer.
- Abstract
- Project Description
- Features
- Project Structure
- Installation
- Configuration
- Usage
- RAG Pipeline
- Performance
- Evaluation & Results
- Safety & Compliance
- Use Cases
- Deployment
- Demo & Poster
- Future Work
- Contributing
- License
- Authors & Course
- Contact
- References
- Credits
MediBot implements Retrieval-Augmented Generation (RAG) for medical queries. Documents are embedded using a HuggingFace MiniLM model, stored in Pinecone for fast cosine similarity search, and fed to a Groq LLM (Llama 3) via LangChain to generate concise, source-grounded answers. A Flask web interface enables real-time interaction. When the knowledge base can’t confidently answer, a SerpAPI fallback performs a trusted web search and clearly labels web-sourced responses. The system is deployed on AWS EC2 and is live at Open MediBot (Live) , enabling public access in a production setting.
Medical information is vast and context-dependent. Rather than relying on an LLM’s parametric memory, this chatbot retrieves the most relevant passages from a curated, expert-authored corpus and uses them as context for answer generation. This grounds responses in real content and improves trustworthiness and traceability.
- Flask UI: Lightweight web interface for chatting with the model.
- HuggingFace Embeddings:
sentence-transformers/all-MiniLM-L6-v2(384-dim) to represent documents/queries. - Pinecone Vector Store: Stores embeddings in index
medicalbot; cosine similarity search. - Groq LLM via LangChain: ChatGroq with Llama 3 to synthesise answers over retrieved context.
- LangChain RAG Chain: Runnable graph for retrieval + generation with prompt templates and history.
- SerpAPI Fallback: Web search with explicit “External source (web)” labelling in the UI.
- AWS Deployment: EC2-hosted Flask app via Docker, environment-driven config.
- Grounded answers: Minimise hallucinations via retrieved evidence & citations.
- Speed: Fast retrieval + inference for real-time Q&A.
- Safety: Clear guardrails, transparent source, non-diagnostic use.
- Scalability: Grows with more documents and concurrent users; cloud-ready on AWS.
- 🔎 Similarity search over embedded expert medical PDFs
- 🧠 RAG answer generation with Groq LLMs
- 🌐 SerpAPI fallback with explicit source labelling
- 🖥️ Flask chat UI and JSON API
- 🧾 Source snippets + page numbers
- 🔐
.env-based configuration for API keys - ⚙️ Modular code to swap models, indexes, and prompts
ai-medical-chatbot/
├── app.py # Flask entrypoint (UI + /get API)
├── src/
│ ├── __init__.py
│ ├── helper.py # Data Split to chunks + Embeddings (HuggingFace) + GoogleSearch (Fallback)
│ └── prompt.py # Prompt for LLM Model
├── templates/
│ └── chat.html # Chat UI (labels whether answer is KB or Web)
├── static/
│ ├── styles.css # Styles
│ └── images/ # Logo, screenshots
├── docker/
│ ├── Dockerfile # Container image for AWS deploy
├── requirements.txt
├── store_index.py # Pinecone Create/Initialise
├── .env.example # Copy to .env and fill keys
└── README.md
- Python 3.10+
- Pinecone account + API key
- Groq API key
- SerpAPI key for fallback
- A curated set of expert-authored medical PDFs to embed
- AWS EC2 instance (Ubuntu recommended) with security group allowing HTTP/HTTPS
-
Clone and enter the repo
git clone https://github.com/ACM40960/AI-Medical-LLM.git cd AI-Medical-LLM -
Create and activate a virtual environment
python -m venv venv # macOS/Linux source venv/bin/activate # Windows (PowerShell) venv\Scripts\Activate.ps1
-
Install dependencies
pip install -r requirements.txt
-
Docker
- Install Docker on EC2.
- Build & run:
docker build -t medibot:latest . docker run -d --name medibot -p 8501:8501 --env-file .env medibot:latest
-
CI/CD
- Use GitHub Actions to build and deploy on push to
main(Docker or rsync/SSH).
- Use GitHub Actions to build and deploy on push to
Create a .env file and set:
# Pinecone
PINECONE_API_KEY=...
PINECONE_INDEX_NAME=medicalbot
# HuggingFace Embeddings
EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2
# Groq
GROQ_API_KEY=...
GROQ_MODEL=llama3-70b-8192
# Optional fallback (SerpAPI)
SERPAPI_API_KEY=...
# App
FLASK_HOST=0.0.0.0
FLASK_PORT=8501
TOP_K=3
FLASK_ENV=production # set to production on AWSOn AWS, keep the
.envfile outside of version control and rotate keys regularly.
python app.py
# App runs at http://localhost:8501- Docker: run the container as shown in the Production Setup section.
- Access the live app at Open MediBot (Live).
POST /get (example)
curl -X POST http://16.16.207.95:8501/get \
-H "Content-Type: application/json" \
-d '{"query":"What are common symptoms of anemia?"}'{
"answer": "Concise, grounded answer...",
"sources": [
{"title": "Doc A", "page": 14, "url": "..."}
],
"provenance": "knowledge_base" // or "web" when SerpAPI fallback is used
}Offline Indexing
- Load PDFs (via LangChain
PyPDFLoader). - Chunk & preprocess: 500-token chunks with 50-token overlap.
- Embed chunks with MiniLM and upsert vectors into Pinecone.
Online Inference
- Embed query with MiniLM; retrieve top-k=3 similar chunks (cosine similarity, 384-dim vectors).
- Construct prompt with retrieved evidence and citation slots.
- Generate answer via Groq LLM (ChatGroq) using LangChain’s runnable graph.
- Return answer with citations; if evidence is insufficient, trigger SerpAPI fallback and label as web-sourced in UI.
- Chunk size: 500 tokens
- Overlap: 50 tokens
- Embedding: MiniLM-L6-v2 (384-dim)
- Similarity: cosine
- Retriever: top_k=3
- If retrieval confidence is low or no relevant chunks are found, call SerpAPI for a trusted web search.
- UI clearly labels: “External source (web)” and cites the URL.
- The bot does not fabricate an answer when evidence is missing.
Representative timings from our tests (hardware/network-dependent):
- Total response latency: ~100 ms
- Retrieval time: ~20 ms
- LLM inference time: ~3 ms
- Test Suite: >50 medical questions drawn from textbook chapters.
- Internal query accuracy: 100%
- Fallback use rate: 10%
- Error rate: 0%
- Hallucinations: None observed in tests.
Trace Count: Shows number of runs over time. Nearly all runs completed successfully (green), with very few errors (red), validating stability.
Trace Latency (P50 vs P99):
- P50 (median, blue): Typical response latency was near real-time (sub-second).
- P99 (purple): Even at peak load, the slowest 1% of requests rarely exceeded 30s, showing strong robustness.
LLM Count: Indicates number of LLM calls. The system scales with spikes in usage while maintaining a near-zero error rate.
LLM Latency (P50 vs P99):
- Median (P50): Most LLM calls returned in under 1s–2s.
- Tail (P99): Occasional outliers (up to ~10s) were observed but remained rare.
✅ Interpretation: MediBot consistently delivers accurate, low-latency responses, gracefully falls back when needed, and maintains high reliability (0% error rate). These metrics confirm its readiness for real-world deployment in a medical information support role.
- EU AI Act (2024): Treated as a high-risk context; designed for transparency and safety.
- Non-diagnostic use: Educational/informational only; not medical advice.
- Data protection: No personal data processed; aligns with GDPR and HIPAA principles.
- Ethical design: Informed by Hippocrates human-first framework.
- Provenance: Every response indicates whether it came from the knowledge base or from the web (fallback).
- Student Support | Patient Clarity | Hospital FAQs | NGO Assistance | Offline Access
- Cloud: AWS (EC2; works with Docker).
- Status: Live at Open MediBot (Live)
- Security: Use HTTP, restrict inbound ports, rotate API keys.
- CI/CD: GitHub Actions workflow to build Docker image and deploy on push to
main.
- Live app: Open MediBot (Live)
- Poster PDF: MediBot_Poster.pdf
- Global languages and voice support
- Resource expansion (broader, multilingual corpora)
- Doctor co-pilot mode; EHR integration (with strict security)
- Feedback loops for retrieval & prompt tuning
- Ingestion dashboard and source highlighting in UI
Contributions are welcome! Please open an issue or submit a PR for bug fixes, features, or documentation improvements.
- Fork the repo
- Create a feature branch
- Commit changes with clear messages
- Open a pull request
This project is licensed under the MIT License. See the LICENSE file.
- Netheeswaran A (24204827)
- Vimarish K M (24229318)
ACM40960 — Projects in Maths Modelling
For questions or suggestions, please open an issue or contact: netheeswarana@gmail.com / vimarish18100@gmail.com
- Lewis et al. (2020) - Retrieval-Augmented Generation
- Singhal et al. (2023) - Med-PaLM
- EU AI Act (2024)
- Hippocrates (2023) - Ethical design in AI
- Yang et al. (2024) - LLM eHealth Chatbots
- Laranjo et al. (2020) - Chatbot design principles
- LangChain Docs • Pinecone Docs • Groq API • SerpAPI Docs
- HuggingFace (MiniLM embeddings)
- Pinecone (vector database)
- LangChain (retrieval orchestration)
- Groq (LLMs for generation)
- Flask (web framework)




