Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
app_assets		app_assets
dashboard		dashboard
data		data
databases/my_lancedb/table_simple05.lance		databases/my_lancedb/table_simple05.lance
docs		docs
middleware		middleware
notebooks		notebooks
prompts		prompts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
docker-grafana.yml		docker-grafana.yml
docker-grafana_mid.yml		docker-grafana_mid.yml
docker-langfuse.yml		docker-langfuse.yml
docker-mongodb.yml		docker-mongodb.yml
docker.env		docker.env
pyproject.toml		pyproject.toml
rag_config.toml		rag_config.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.ps1		setup.ps1
setup.py		setup.py
setup.sh		setup.sh
wait-for-it.sh		wait-for-it.sh

Repository files navigation

Nutrition Insights with Dr. Greger's Digital Twin 🥦

A RAG-based Q&A chatbot

This digital assistant, inspired by Dr. Michael Greger & his team at NutritionFacts.org, was created to answer user questions about healthy eating and lifestyle choices. Drawing from over 1,200 well-researched blog posts since 2011, it provides science-backed insights to help users live a healthier, more informed life.

Start chatting with Dr. Greger's Digital Twin here.

Demo Video

streamlit-app-2024-09-10-16-09-09.webm

Documentation

Dataset

The raw data used to build the RAG knowledge base is stored in data/blog_posts/json. It consists of all blog posts from https://nutritionfacts.org/blog/ (as of 28.08.2024). See the notebooks/web_scraping.ipynb notebook for more technical details on the web scraping process.

Technologies

The chatbot was build with the following technologies:

Web Scraping: Beautiful Soup Library
Text embeddings: pre-trained model multi-qa-MiniLM-L6-cos-v1 of the Sentence Transformers Library
- build with PyTorch and Huggingface's Transformers Library
- It was "tuned for semantic search: Given a query/question, it can find relevant passages. It was trained on a large and diverse set of (question, answer) pairs."
Vector Store (aka Knowledge Base of RAG): LanceDB Library
Information Retrieval (IR):
- Full-text search (aka Keyword-Search): Tantivy Library (based on BM25) (LanceDB Doc).
- Vector Search (aka Search for nearest neighbors) Metric: Cosine Similarity (LanceDB Doc).
- Reranker: Linear Combination Reranker with 30% for Vector Search (LanceDB Doc).
LLM API: Groq Cloud (free tier)
- List of Groq's Models
Web App: Streamlit Library
Deployment: Streamlit Cloud (free tier)
Database for User Data: MongoDB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nutrition Insights with Dr. Greger's Digital Twin 🥦

A RAG-based Q&A chatbot

Demo Video

Documentation

Dataset

Technologies

About

Languages

License

alexkolo/rag_nutrition_facts_blog

Folders and files

Latest commit

History

Repository files navigation

Nutrition Insights with Dr. Greger's Digital Twin 🥦

A RAG-based Q&A chatbot

Demo Video

Documentation

Dataset

Technologies

About

Resources

License

Stars

Watchers

Forks

Languages