🧠 Advanced Deep Research

Advanced Deep Research is an autonomous multi-agent research framework designed to simulate a human-level deep researcher. It breaks down complex queries into actionable sub-questions, performs real-time searches across multiple sources (web, papers, and local vector DB), and synthesizes the most relevant information into clear, didactic summaries.

🚀 Features

🔍 Sub-question generation using a local LLM (Qwen 2.5)
🌐 Web search via Brave Search, Google, or SerpAPI
📄 Advanced content extraction from HTML and PDFs (with pymupdf4llm)
✍️ Chunked summarization using facebook/bart-large-cnn (fine-tuned)
🎯 Relevance filtering via jina-reranker-v2-base-multilingual (threshold: 0.5)
🗂 Knowledge storage in a local vector DB (Qdrant)
🤖 Reflective agent to determine when to stop searching
📘 Final summarizer agent for clear, didactic answers

⚙️ Tech Stack

Component	Technology/Model
LLM (main)	Qwen 2.5 via vLLM (OpenAI-compatible API)
Embeddings	`jinaai/jina-embeddings-v3`
Summarization	`facebook/bart-large-cnn`
Re-ranker	`jinaai/jina-reranker-v2-base-multilingual`
Vector Storage	Qdrant
PDF Parsing	`pymupdf4llm`
Web Search	Brave API, Google (local), SerpAPI, Tavily
Backend	FastAPI + Transformers(Hugging Face)

📂 Project Structure

resumidor/
├── cache/                   # Caching utilities
├── config/                  # Configuration and environment handling
├── databases/               # DB integrations (e.g., Qdrant)
├── deep_searcher/           # Core loop for deep search
├── dockers/                 # Docker configurations
├── factory/                 # Model and service instantiation
├── llm/                     # LLM interaction logic (Qwen, etc.) and tools
├── management/              # Process managers / controllers
├── models/                  # Model loading and handling
├── parsers/                 # Web & PDF content parsers
├── prompt_engineering/      # Prompt templates
├── researchers/             # Research engines
├── schemas/                 # Pydantic schemas
├── server/                  # FastAPI server logic
├── tests/                   # Unit and integration tests

🧰 Installation

1. Clone the repository

git clone https://github.com/prodesk98/advanced-deep-research.git
cd advanced-deep-research

2. Setup environment variables

Copy .env.example to .env and set your keys:

cp .env.example .env

Fill in your credentials:

GOOGLE_SEARCH_ENGINE=local,brave,serpapi
CRAWLER_ENGINE=local,firecrawl
BRAVE_API_KEY=your_key
SERPAPI_KEY=your_key
FIRECRAWL_API_KEY=your_key
HF_TOKEN=your_huggingface_token

3. Install dependencies

Using Poetry:

pip install poetry
poetry install

4. Download models

poetry run python -m download_cli.py

🐳 Docker Deployment

docker compose up -d

App runs at: http://localhost:8501

🧠 Research Pipeline (Simplified)

graph TD
    UI[User Interface] --> Q[User Question] --> SQ[Sub-questions]
    SQ --> WS[Search: Brave / Google / ArXiv]
    WS --> XT[Extract Content]
    XT --> SM[Summarize]
    SM --> RK[Re-rank Relevant Info]
    RK --> RF[Reflect: Is Answer Complete?]
    RF -- No --> SQ
    RF -- Yes --> DS[Didactic Final Summary]
    DS --> DB[Store in Vector DB]

📌 Roadmap

Sub-questioning + multi-source search
ArXiv PDF extraction
Chunked summarization with BART
Reranker filtering (threshold-based)
Reflective agent for iterative research
Final summarizer for clarity
CLI / Web Interface
Export to Markdown / PDF
Chrome/Firefox extension for contextual search

📜 License

MIT License

🤝 Contributing

Open issues, submit pull requests, or suggest improvements!
All contributions are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Advanced Deep Research

🚀 Features

⚙️ Tech Stack

📂 Project Structure

🧰 Installation

1. Clone the repository

2. Setup environment variables

3. Install dependencies

4. Download models

🐳 Docker Deployment

🧠 Research Pipeline (Simplified)

📌 Roadmap

📜 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
config		config
databases		databases
deep_searcher		deep_searcher
dockers		dockers
factory		factory
llm		llm
management		management
models		models
parsers		parsers
prompt_engineering		prompt_engineering
researchers		researchers
schemas		schemas
server		server
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yaml		docker-compose.yaml
download_cli.py		download_cli.py
exceptions.py		exceptions.py
loggings.py		loggings.py
pyproject.toml		pyproject.toml
template.env		template.env

License

prodesk98/advanced-deep-research

Folders and files

Latest commit

History

Repository files navigation

🧠 Advanced Deep Research

🚀 Features

⚙️ Tech Stack

📂 Project Structure

🧰 Installation

1. Clone the repository

2. Setup environment variables

3. Install dependencies

4. Download models

🐳 Docker Deployment

🧠 Research Pipeline (Simplified)

📌 Roadmap

📜 License

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages