ModelPulse, A local, GPU-accelerated retrieval-augmented AI system to track the latest in LLM research.
ModelPulse is an open-source, GPU-accelerated retrieval-augmented system that helps developers and researchers stay up to date with the fast-moving world of LLMs โ tracking new research, model releases, and blogs. Everything runs fully locally with Hugging Face models โ no API calls required (optional: enable RAGAS evaluation with OpenAI API key).
- Overview
- Features
- Tech Stack
- Architecture
- Quick Start
- Manual Setup
- Example Queries
- Evaluation Metrics
- Repository Structure
- Configuration
- Roadmap
- License
- Acknowledgments
LLM research moves fast โ new architectures, RAG techniques, and benchmarks appear weekly. ModelPulse acts as your personal AI radar, automatically:
- ๐ฐ Collects updates from trusted sources like OpenAI, Anthropic, Hugging Face, and arXiv
- ๐ Builds semantic search indexes for Q&A
- ๐ง Generates summaries and digests with citations
- ๐ Tracks faithfulness, latency, and cost metrics
- โ๏ธ Adapts over time using feedback and metrics
Perform semantic searches across the latest LLM research and documentation
Ask questions and get grounded answers with source citations
Automatically generate summaries and digests from retrieved content
| ๐ | Feature | Description |
|---|---|---|
| ๐งฉ | Hybrid Retrieval | Combines BM25 + FAISS vector search for optimal precision and recall |
| ๐ง | Grounded Summarization | Answers are cited and based on retrieved evidence |
| โ๏ธ | Fully Local | Works offline with GPU inference โ no API required |
| ๐ | Evaluation Dashboard | Visual metrics: latency, quality, and cost (optional with API key) |
| ๐งฎ | Adaptive Tuning | Learns retrieval parameters automatically (optional with API key) |
| ๐ฌ | Topic Watchlists | Alerts you when new papers appear on your topics |
| ๐ก Layer | ๐ง Tools & Libraries |
|---|---|
| Ingestion | feedparser, beautifulsoup4, trafilatura |
| Embeddings | sentence-transformers, BAAI/bge-base-en-v1.5, intfloat/e5-base-v2 |
| Retrieval | faiss-gpu, rank_bm25, cross-encoder/ms-marco-MiniLM-L-6-v2 |
| Generation | Local LLMs (Qwen2.5-7B, Mistral-7B, Llama-3.1-8B) |
| Evaluation | ragas, scikit-learn, matplotlib |
| UI / Backend | Streamlit, FastAPI, SQLite |
| Deployment | Docker, docker-compose, NVIDIA GPU |
โโโโโโโโโโโโโโโโโโโโ
โ Connectors โ โ RSS, Blogs, arXiv, APIs
โโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโผโโโโโโโโโโโ
โ Indexing Layer โ โ Chunking + Embeddings (BGE/E5)
โโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโผโโโโโโโโโโโ
โ Retrieval Layer โ โ BM25 + FAISS + Cross-Encoder
โโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโผโโโโโโโโโโโ
โ Generation Layer โ โ Local LLMs (Qwen/Mistral/Llama)
โโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโผโโโโโโโโโโโ
โ Evaluation Layer โ โ RAGAS + latency + cost tracking
โโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโผโโโโโโโโโโโ
โ Streamlit UI โ โ Dashboard + QA + Digests
โโโโโโโโโโโโโโโโโโโโ
# 1. Install NVIDIA Container Toolkit
./install_nvidia_docker.sh
# 2. (Optional) Configure OpenAI API key for RAGAS evaluation & adaptive tuning
# Skip this step for fully local operation without evaluation metrics
cp .env.example .env
nano .env
# Add your key: OPENAI_API_KEY=sk-proj-your-key
# 3. Start ModelPulse
./start.sh
# 4. Visit the dashboard
# โ http://localhost:8501๐ First run: 15โ30 min (downloads models and builds index). Next runs: ~30 sec (just launches the UI).
Note: Without
OPENAI_API_KEY, ModelPulse runs 100% locally โ ingestion, search, Q&A, and UI all work offline. The API key is only needed for RAGAS evaluation metrics and adaptive config tuning.
git clone https://github.com/LeoFu9487/ModelPulse.git
cd ModelPulse
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Ingest and index data
python3 -m jobs.ingest_daily
python3 -m pipeline.chunk
python3 -m pipeline.embed
# Launch dashboard
python3 -m streamlit run ui/app_streamlit.pyQ: Whatโs new in RAG evaluation this week?
A: A new metric called โcontext coherenceโ was introduced by Hugging Face [1],
improving precision for long-form retrieval tasks [2].
Sources:
[1] https://huggingface.co/blog/ragas-update
[2] https://arxiv.org/abs/2401.01234
| Metric | Description | Requires API Key |
|---|---|---|
| Faithfulness | Alignment between generated answer and sources | Yes (RAGAS) |
| Answer Relevancy | Semantic relevance of generated answers | Yes (RAGAS) |
| Precision / Recall | Context retrieval accuracy | Yes (RAGAS) |
| Latency | Response time per query | No (local) |
| Cost | GPU compute cost per evaluation | No (local) |
| Confidence | Weighted similarity of top-k retrieved chunks | No (local) |
Note: RAGAS-based metrics (faithfulness, relevancy, precision, recall) require OPENAI_API_KEY.
All other features including latency tracking and the Streamlit dashboard work fully locally.
modelpulse/
โโโ connectors/ # Data sources
โโโ pipeline/ # Chunking & embedding
โโโ retriever/ # Hybrid + reranking logic
โโโ rag/ # Q&A and evaluation
โโโ ui/ # Streamlit app
โโโ jobs/ # Ingestion & digest tasks
โโโ storage/ # SQLite data
โโโ Dockerfile
config.yaml example:
embeddings:
model: BAAI/bge-base-en-v1.5
retrieval:
top_k: 8
alpha_bm25: 0.5
generator:
model: Qwen/Qwen2.5-7B-Instruct
quantization_4bit: true
temperature: 0.0Restart after changes:
docker compose down && ./start.sh- Active Learning Loop โ feedback-based retrieval tuning
- Retrieval Compression Benchmark โ dense vs sparse
- Fine-Tuned Domain Embeddings
- Multimodal Support (CLIP)
- Personalized Watchlists
MIT License ยฉ 2025 Yu-Peng FU
Thanks to the open-source community:


