Deep Research assistant that runs on your laptop, using tiny models. - all open source!
It offers a structured breakdown of a multi-source, relevance-driven, recursive search pipeline. It walks through how the system refines a user query, builds a knowledge base from local and web data, and dynamically explores subqueries—tracking progress through a Table of Contents (TOC).
With Monte Carlo-based exploration, the system balances depth vs. breadth, ranking each branch's relevance to ensure precision and avoid unrelated tangents. The result? A detailed, well-organized report generated using retrieval-augmented generation (RAG), integrating the most valuable insights.
I wanted to experiment with new research methods, so I thought, basically, when we research a topic, we randomly explore new ideas as we search, and NanoSage basically does that! It explores and records its journey, where each (relevant) step is a node... and then sums it up to you in a neat report! Where the table of content is basically its search graph. 🧙
Major System Enhancement: Tavily Integration & Hybrid Embedding Architecture
- Tavily Search API Integration: Replaced unreliable free search engines with Tavily's robust API, enabling access to high-quality academic sources including PubMed, research journals, and scholarly databases
- Hybrid Embedding System: Implemented intelligent model selection where SigLIP/CLIP handle vision tasks (images, PDFs) while all-MiniLM processes text content, eliminating dimension mismatches and optimizing performance
- Enhanced Web Crawler: Added comprehensive web content extraction with metadata generation, domain grouping, and fallback search engines for maximum reliability and coverage
You can find an example report in the following link:
example report output for query: "Create a structure bouldering gym workout to push my climbing from v4 to v6"
- Ensure Python 3.8+ is installed.
- Install required packages:
pip install -r requirements.txt- (Optional) For GPU acceleration, install PyTorch with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118(Replace cu118 with your CUDA version.)
- Make sure to update pyOpenSSL and cryptography:
pip install --upgrade pyOpenSSL cryptography- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
pip install --upgrade ollama(Windows users: see ollama.com for installer.)
- Pull Gemma 2B (for RAG-based summaries):
ollama pull gemma2:2b# Copy the example environment file
cp env.example .env
# Edit .env and add your Tavily API key
# Get your free API key at: https://tavily.com/
TAVILY_API_KEY=your_tavily_api_key_here# Basic web search with SigLIP (vision + text hybrid) - RECOMMENDED
python main.py --query "machine learning algorithms" --retrieval_model siglip --web_search
# Web search with local documents
python main.py --query "quantum computing" --corpus_dir ./my_documents --retrieval_model siglip --web_search
# Fast text-only search
python main.py --query "artificial intelligence" --retrieval_model all-minilm --web_search
# Local documents only (no web search)
python main.py --query "research papers" --corpus_dir ./my_documents --retrieval_model colpaliParameters:
--query: Main search query (natural language).--web_search: Enables web-based retrieval via Tavily API.--retrieval_model: Choose fromsiglip(recommended),clip,colpali, orall-minilm.--device cpu: Uses CPU (swap withcudafor GPU).--max_depth: Recursion depth for subqueries (default: 1).
siglip: Vision + text hybrid (recommended for images/PDFs + web content)clip: Vision + text hybrid (alternative to SigLIP)colpali: Advanced text model (good for documents)all-minilm: Fast text model (good for speed)
- Text:
.txt,.md,.py,.json,.yaml,.csv - PDFs: Converted to images for vision models, OCR for text models
- Images:
.png,.jpg,.jpeg(vision models or OCR fallback)
A detailed Markdown report will appear in results/<query_id>/.
Example:
results/
└── 389380e2/
├── Quantum_computing_in_healthcare_output.md
├── web_Quantum_computing/
├── web_results/
└── local_results/
Open the *_output.md file (e.g., Quantum_computing_in_healthcare_output.md) in a Markdown viewer (VSCode, Obsidian, etc.).
If you have local PDFs, text files, or images:
python main.py --query "AI in finance" \
--corpus_dir "my_local_data/" \
--top_k 5 \
--device cpuNow the system searches both local docs and web data (if --web_search is enabled).
python main.py --query "Climate change impact on economy" \
--rag_model gemma \
--personality "scientific"This uses Gemma 2B to generate LLM-based summaries and the final report.
- Missing dependencies? Rerun:
pip install -r requirements.txt - Ollama not found? Ensure it's installed (
ollama listshowsgemma:2b). - Memory issues? Use
--device cpu. - Too many subqueries? Lower
--max_depthto 1. - Web search not working? Check your
TAVILY_API_KEYin.envfile.
- Try different retrieval models (
--retrieval_model siglipfor best results). - Tweak recursion (
--max_depth). - Tune
config.yamlfor web search limits,min_relevance, or Monte Carlo search.
NanoSage uses a hybrid embedding approach for optimal performance:
-
Vision Models (SigLIP/CLIP):
- Use SigLIP/CLIP for images and PDFs → images
- Use all-MiniLM for text content (web pages, documents)
- Ensures consistent 384D embeddings for text content
-
Text Models (ColPali/all-MiniLM):
- Use the same model for all content types
- Consistent embedding dimensions
This approach eliminates dimension mismatches and uses the right tool for each content type.
- Primary: Tavily Search API (reliable, academic sources)
- Fallback: DuckDuckGo, SearxNG, Wikipedia
- Sources: PubMed, academic journals, research databases
- Features: Real-time search, content extraction, metadata generation
- User Query: E.g.
"Quantum computing in healthcare". - CLI Flags (in
main.py):--corpus_dir --device --retrieval_model --top_k --web_search --personality --rag_model --max_depth - YAML Config (e.g.
config.yaml):"results_base_dir","max_query_length","web_search_limit","min_relevance", etc.
-
Configuration:
load_config(config_path)to read YAML settings.min_relevance: cutoff for subquery branching.
-
Session Initialization:
SearchSession.__init__()sets:- A unique
query_id&base_result_dir. - Enhanced query via
chain_of_thought_query_enhancement(). - Retrieval model loaded with
load_retrieval_model(). - Query embedding for relevance checks (
embed_text()). - Local files (if any) loaded & added to
KnowledgeBase.
- A unique
-
Subquery Generation:
- The enhanced query is split with
split_query().
- The enhanced query is split with
-
Monte Carlo Subquery Sampling (Optional):
- The system can use a Monte Carlo approach to intelligently sample the most relevant subqueries, balancing exploration depth with computational efficiency.
- Each subquery is scored for relevance against the main query using embedding similarity.
- Only the most promising subqueries are selected for further exploration.
-
Relevance Filtering:
- For each subquery, compare embeddings with the main query (via
late_interaction_score()). - If
< min_relevance, skip to avoid rabbit holes.
- For each subquery, compare embeddings with the main query (via
-
TOCNode Creation:
- Each subquery →
TOCNode, storing the text, summary, relevance, etc.
- Each subquery →
-
Web Data:
- If relevant:
search_and_download()via Tavily API to fetch results.parse_any_to_text()and embed them.- Summarize snippets (
summarize_text()).
- If
current_depth < max_depth, optionally expand new sub-subqueries (chain-of-thought on the current subquery).
- If relevant:
-
Hierarchy:
- All subqueries & expansions form a tree of TOC nodes for the final report.
- Local Documents + Downloaded Web Entries → appended into
KnowledgeBase. - KnowledgeBase.search(...) for top-K relevant docs.
- Summaries:
- Summarize web results & local retrieval with
summarize_text().
- Summarize web results & local retrieval with
-
_build_final_answer(...):
- Constructs a large prompt including:
- The user query,
- Table of Contents (with node summaries),
- Summaries of web & local results,
- Reference URLs.
- Asks for a "multi-section advanced markdown report."
- Constructs a large prompt including:
-
rag_final_answer(...):
- Calls
call_gemma()(or other LLM) to produce the final text.
- Calls
-
aggregate_results(...):
- Saves the final answer plus search data into a
.mdfile inresults/<query_id>/.
- Saves the final answer plus search data into a
- Subqueries with relevance_score < min_relevance are skipped.
- Depth-limited recursion ensures not to blow up on too many expansions.
- Monte Carlo expansions (optional) can sample random subqueries to avoid missing unexpected gems.
- Markdown report summarizing relevant subqueries, local docs, and a final advanced RAG-based discussion.
User Query
│
▼
main.py:
└── load_config(config.yaml)
└── Create SearchSession(...)
│
├── chain_of_thought_query_enhancement()
├── load_retrieval_model()
├── embed_text() for reference
├── load_corpus_from_dir() → KnowledgeBase.add_documents()
└── run_session():
└── perform_recursive_web_searches():
├── For each subquery:
│ ├─ Compute relevance_score
│ ├─ if relevance_score < min_relevance: skip
│ ├─ else:
│ │ ├─ search_and_download() (Tavily API)
│ │ ├─ parse_any_to_text(), embed
│ │ ├─ summarize_text() → store in TOCNode
│ │ └─ if depth < max_depth:
│ │ └─ recursively expand
└── Aggregates web corpus, builds TOC
│
├── KnowledgeBase.search(enhanced_query, top_k)
├── Summarize results
├── _build_final_answer() → prompt
├── rag_final_answer() → call_gemma()
└── aggregate_results() → saves Markdown
If you found NanoSage useful for your research or project - or saved you 1 minute of googling, please consider citing it:
BibTeX Citation:
@misc{NanoSage,
author = {Foad Abo Dahood},
title = {NanoSage: A Recursive, Relevance-Driven Search and RAG Pipeline},
year = {2025},
howpublished = {\url{https://github.com/masterFoad/NanoSage}},
note = {Accessed: \today}
}APA Citation:
Foad, Abo Dahood. (2025). NanoSage: A Recursive, Relevance-Driven Search and RAG Pipeline. Retrieved from https://github.com/masterFoad/NanoSage