MediBot — Retrieval-Augmented Generation (RAG) for Trusted Medical Answers

Flask • LangChain • Pinecone • HuggingFace Embeddings • Groq (Llama 3) • SerpAPI fallback • Deployed on AWS EC2

Live app: Open MediBot (Live)

A retrieval-augmented medical Q&A chatbot. It embeds expert-authored medical documents, stores them in a Pinecone vector index, retrieves the most relevant chunks for a user query, and generates grounded answers with a Groq-hosted LLM via LangChain. A minimal Flask app provides a real-time chat UI and a simple API. If no trustworthy match is found, SerpAPI fallback performs a web search and provides an answer.

Abstract
Project Description
- Key Components
- System Goals
Features
Project Structure
Installation
- Prerequisites
- Local Setup
- Production Setup (AWS)
Configuration
Usage
- Run Locally
- Run in Production (AWS)
- API
RAG Pipeline
- Chunking & Retrieval Settings
- Fallback Strategy
Performance
Evaluation & Results
Safety & Compliance
Use Cases
Deployment
Demo & Poster
Future Work
Contributing
License
Authors & Course
Contact
References
Credits

Abstract

MediBot implements Retrieval-Augmented Generation (RAG) for medical queries. Documents are embedded using a HuggingFace MiniLM model, stored in Pinecone for fast cosine similarity search, and fed to a Groq LLM (Llama 3) via LangChain to generate concise, source-grounded answers. A Flask web interface enables real-time interaction. When the knowledge base can’t confidently answer, a SerpAPI fallback performs a trusted web search and clearly labels web-sourced responses. The system is deployed on AWS EC2 and is live at Open MediBot (Live) , enabling public access in a production setting.

Project Description

Medical information is vast and context-dependent. Rather than relying on an LLM’s parametric memory, this chatbot retrieves the most relevant passages from a curated, expert-authored corpus and uses them as context for answer generation. This grounds responses in real content and improves trustworthiness and traceability.

Key Components

Flask UI: Lightweight web interface for chatting with the model.
HuggingFace Embeddings: sentence-transformers/all-MiniLM-L6-v2 (384-dim) to represent documents/queries.
Pinecone Vector Store: Stores embeddings in index medicalbot; cosine similarity search.
Groq LLM via LangChain: ChatGroq with Llama 3 to synthesise answers over retrieved context.
LangChain RAG Chain: Runnable graph for retrieval + generation with prompt templates and history.
SerpAPI Fallback: Web search with explicit “External source (web)” labelling in the UI.
AWS Deployment: EC2-hosted Flask app via Docker, environment-driven config.

System Goals

Grounded answers: Minimise hallucinations via retrieved evidence & citations.
Speed: Fast retrieval + inference for real-time Q&A.
Safety: Clear guardrails, transparent source, non-diagnostic use.
Scalability: Grows with more documents and concurrent users; cloud-ready on AWS.

Features

🔎 Similarity search over embedded expert medical PDFs
🧠 RAG answer generation with Groq LLMs
🌐 SerpAPI fallback with explicit source labelling
🖥️ Flask chat UI and JSON API
🧾 Source snippets + page numbers
🔐 .env-based configuration for API keys
⚙️ Modular code to swap models, indexes, and prompts

Project Structure

ai-medical-chatbot/
├── app.py                      # Flask entrypoint (UI + /get API)
├── src/
│   ├── __init__.py             
│   ├── helper.py               # Data Split to chunks + Embeddings (HuggingFace) + GoogleSearch (Fallback)
│   └── prompt.py               # Prompt for LLM Model
├── templates/
│   └── chat.html               # Chat UI (labels whether answer is KB or Web)
├── static/
│   ├── styles.css              # Styles
│   └── images/                 # Logo, screenshots
├── docker/
│   ├── Dockerfile              # Container image for AWS deploy
├── requirements.txt
├── store_index.py              # Pinecone Create/Initialise
├── .env.example                # Copy to .env and fill keys
└── README.md

Installation

Prerequisites

Python 3.10+
Pinecone account + API key
Groq API key
SerpAPI key for fallback
A curated set of expert-authored medical PDFs to embed
AWS EC2 instance (Ubuntu recommended) with security group allowing HTTP/HTTPS

Local Setup

Clone and enter the repo

git clone https://github.com/ACM40960/AI-Medical-LLM.git
cd AI-Medical-LLM

Create and activate a virtual environment

python -m venv venv
# macOS/Linux
source venv/bin/activate
# Windows (PowerShell)
venv\Scripts\Activate.ps1

Install dependencies
```
pip install -r requirements.txt
```

Production Setup (AWS)

Docker

Install Docker on EC2.

Build & run:

docker build -t medibot:latest .
docker run -d --name medibot -p 8501:8501 --env-file .env medibot:latest

CI/CD
- Use GitHub Actions to build and deploy on push to main (Docker or rsync/SSH).

Configuration

Create a .env file and set:

# Pinecone
PINECONE_API_KEY=...
PINECONE_INDEX_NAME=medicalbot

# HuggingFace Embeddings
EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2

# Groq
GROQ_API_KEY=...
GROQ_MODEL=llama3-70b-8192  

# Optional fallback (SerpAPI)
SERPAPI_API_KEY=...

# App
FLASK_HOST=0.0.0.0
FLASK_PORT=8501
TOP_K=3
FLASK_ENV=production        # set to production on AWS

On AWS, keep the .env file outside of version control and rotate keys regularly.

Usage

Run Locally

python app.py
# App runs at http://localhost:8501

Run in Production (AWS)

Docker: run the container as shown in the Production Setup section.
Access the live app at Open MediBot (Live).

API

POST /get (example)

curl -X POST http://16.16.207.95:8501/get \
  -H "Content-Type: application/json" \
  -d '{"query":"What are common symptoms of anemia?"}'

Response

{
  "answer": "Concise, grounded answer...",
  "sources": [
    {"title": "Doc A", "page": 14, "url": "..."}
  ],
  "provenance": "knowledge_base"  // or "web" when SerpAPI fallback is used
}

RAG Pipeline

Offline Indexing

Load PDFs (via LangChain PyPDFLoader).
Chunk & preprocess: 500-token chunks with 50-token overlap.
Embed chunks with MiniLM and upsert vectors into Pinecone.

Online Inference

Embed query with MiniLM; retrieve top-k=3 similar chunks (cosine similarity, 384-dim vectors).
Construct prompt with retrieved evidence and citation slots.
Generate answer via Groq LLM (ChatGroq) using LangChain’s runnable graph.
Return answer with citations; if evidence is insufficient, trigger SerpAPI fallback and label as web-sourced in UI.

Chunking & Retrieval Settings

Chunk size: 500 tokens
Overlap: 50 tokens
Embedding: MiniLM-L6-v2 (384-dim)
Similarity: cosine
Retriever: top_k=3

Fallback Strategy

If retrieval confidence is low or no relevant chunks are found, call SerpAPI for a trusted web search.
UI clearly labels: “External source (web)” and cites the URL.
The bot does not fabricate an answer when evidence is missing.

Performance

Representative timings from our tests (hardware/network-dependent):

Total response latency: ~100 ms
Retrieval time: ~20 ms
LLM inference time: ~3 ms

Evaluation & Results

Test Suite: >50 medical questions drawn from textbook chapters.
Internal query accuracy: 100%
Fallback use rate: 10%
Error rate: 0%
Hallucinations: None observed in tests.

Observed Metrics

Trace Count: Shows number of runs over time. Nearly all runs completed successfully (green), with very few errors (red), validating stability.

Trace Latency (P50 vs P99):

P50 (median, blue): Typical response latency was near real-time (sub-second).
P99 (purple): Even at peak load, the slowest 1% of requests rarely exceeded 30s, showing strong robustness.

LLM Count: Indicates number of LLM calls. The system scales with spikes in usage while maintaining a near-zero error rate.

LLM Latency (P50 vs P99):

Median (P50): Most LLM calls returned in under 1s–2s.
Tail (P99): Occasional outliers (up to ~10s) were observed but remained rare.

✅ Interpretation: MediBot consistently delivers accurate, low-latency responses, gracefully falls back when needed, and maintains high reliability (0% error rate). These metrics confirm its readiness for real-world deployment in a medical information support role.

Safety & Compliance

EU AI Act (2024): Treated as a high-risk context; designed for transparency and safety.
Non-diagnostic use: Educational/informational only; not medical advice.
Data protection: No personal data processed; aligns with GDPR and HIPAA principles.
Ethical design: Informed by Hippocrates human-first framework.
Provenance: Every response indicates whether it came from the knowledge base or from the web (fallback).

Use Cases

Student Support | Patient Clarity | Hospital FAQs | NGO Assistance | Offline Access

Deployment

Cloud: AWS (EC2; works with Docker).
Status: Live at Open MediBot (Live)
Security: Use HTTP, restrict inbound ports, rotate API keys.
CI/CD: GitHub Actions workflow to build Docker image and deploy on push to main.

Demo & Poster

Live app: Open MediBot (Live)
Poster PDF: MediBot_Poster.pdf

Future Work

Global languages and voice support
Resource expansion (broader, multilingual corpora)
Doctor co-pilot mode; EHR integration (with strict security)
Feedback loops for retrieval & prompt tuning
Ingestion dashboard and source highlighting in UI

Contributing

Contributions are welcome! Please open an issue or submit a PR for bug fixes, features, or documentation improvements.

Quick Guide

Fork the repo
Create a feature branch
Commit changes with clear messages
Open a pull request

License

This project is licensed under the MIT License. See the LICENSE file.

Authors & Course

Netheeswaran A (24204827)
Vimarish K M (24229318)
ACM40960 — Projects in Maths Modelling

Contact

For questions or suggestions, please open an issue or contact: netheeswarana@gmail.com / vimarish18100@gmail.com

References

Lewis et al. (2020) - Retrieval-Augmented Generation
Singhal et al. (2023) - Med-PaLM
EU AI Act (2024)
Hippocrates (2023) - Ethical design in AI
Yang et al. (2024) - LLM eHealth Chatbots
Laranjo et al. (2020) - Chatbot design principles
LangChain Docs • Pinecone Docs • Groq API • SerpAPI Docs

Credits

HuggingFace (MiniLM embeddings)
Pinecone (vector database)
LangChain (retrieval orchestration)
Groq (LLMs for generation)
Flask (web framework)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
Data		Data
docs		docs
research		research
src		src
static		static
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
changes.py		changes.py
generatekey.py		generatekey.py
requirements.txt		requirements.txt
setup.py		setup.py
store_index.py		store_index.py
template.py		template.py

Uh oh!

License

Uh oh!

ACM40960/project-maths-model-project

Folders and files

Latest commit

History

Repository files navigation

MediBot — Retrieval-Augmented Generation (RAG) for Trusted Medical Answers

Table of Contents

Abstract

Project Description

Key Components

System Goals

Features

Project Structure

Installation

Prerequisites

Local Setup

Production Setup (AWS)

Configuration

Usage

Run Locally

Run in Production (AWS)

API

RAG Pipeline

Chunking & Retrieval Settings

Fallback Strategy

Performance

Evaluation & Results

Observed Metrics

Safety & Compliance

Use Cases

Deployment

Demo & Poster

Future Work

Contributing

Quick Guide

License

Authors & Course

Contact

References

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages