Sourcing Agent

Deep Research Agent for drug discovery and biomedical entity discovery. Built for high-recall discovery of therapeutic assets using an iterative, multi-agentic workflow.

🏗️ Architecture Overview

The system employs a dual-layered orchestration strategy to balance durability with agentic flexibility:

Temporal Layer: Acts as the "backbone." It manages the high-level workflow state, ensuring research sessions can run for hours (or days) with built-in retries, persistence, and fault tolerance.
LlamaIndex Workflows: Acts as the "brain." It orchestrates the internal agentic reasoning loops, event-driven transitions, and parallel worker dispatches within each iteration. This allows for complex, non-linear discovery paths that are difficult to model in traditional state machines.

🔄 The Research Loop

The agent follows an iterative, self-correcting cycle:

Initial Planning: Synthesizes the research topic into hard/soft constraints using multi-step reasoning.
Parallel Worker Execution: Dispatches specialized workers to search (Perplexity/Tavily) and crawl specific targets.
Advanced Extraction (LlamaExtract + Crawl4AI): Directly extracts schema-aligned data from raw web content. By crawling first and extracting later, we minimize "hallucination by truncation" often seen in standard search snippets.
Adaptive Update: The orchestrator calculates a Novelty Metric (new entities vs. pages fetched). If discovery yields diminish, the planner pivots the strategy, kills unproductive workers, and spawns new ones with fresh queries.
Multi-Tier Verification: Every discovery is audited against a 4-tier evidence weighting system (Regulatory filings > Corporate Pipeline > News > Speculative).
Reconciliation & Gap-Filling: Merges aliases into canonical records and triggers "Deep Read" targeted searches to fill missing critical fields (e.g., Owner or Clinical Phase).

📸 Walkthrough

1. Planning Stage

The system starts by analyzing the query to generate a comprehensive plan and synonym list.

2. Initial Search

Workers begin execution, populating the initial knowledge graph and identifying key nodes.

3. Adaptive Discovery Loop

Workers execute in parallel, visualizing their progress and the growing knowledge graph in real-time.

4. Final Verification & Results

After saturation, the system deduplicates and verifies assets, presenting a final auditable table.

🛠️ Tech Stack & Rationales

Component	Technology	Rationale
Orchestration	Temporal	Ensuring research sessions are durable and resumable across worker restarts.
Agent Logic	LlamaIndex Workflows	Dynamic, event-driven orchestration of complex LLM reasoning chains.
Deep Search	Perplexity API	Context-heavy, reasoning-informed searching for initial landscape analysis.
Rapid Search	Tavily	Fast, snippet-to-markdown discovery for high-velocity parallel crawling.
Extraction	Crawl4AI + Gemini Flash	Efficient headless browsing combined with schema-based extraction from raw HTML/PDFs.
LLMs	Gemini 3 & 2.5	Reasoning (3-Flash-Preview) for planning/verification; Speed (2.5-Flash-Lite) for extraction.
Persistence	SQLAlchemy (Async)	Decoupled domain models from DB logic, allowing seamless switching between SQLite and PostgreSQL.
Shared State	Redis	Atomic, high-speed URL and entity tracking to prevent redundant work across parallel workers.

🚀 Setup

1. Environment Variables

Copy .env.example to .env and fill in your API keys:

PERPLEXITY_API_KEY, TAVILY_API_KEY, LLAMA_CLOUD_API_KEY
GOOGLE_API_KEY (Gemini), TEMPORAL_API_KEY

2. Docker Setup

docker-compose up --build

Frontend: http://localhost:8501
Worker: Background Temporal worker listening for tasks.

Run a Research Task via CLI

docker-compose run worker python backend/run.py "Your research topic"

💻 Local Development

1. Setup Environment

python -m venv env
source env/bin/activate  # On macOS/Linux
pip install -r requirements.txt

2. Run Components

Worker: python backend/worker.py
Frontend: streamlit run frontend/app.py
CLI Runner: python backend/run.py "Your topic"

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
backend		backend
figs		figs
frontend		frontend
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sourcing Agent

🏗️ Architecture Overview

🔄 The Research Loop

📸 Walkthrough

1. Planning Stage

2. Initial Search

3. Adaptive Discovery Loop

4. Final Verification & Results

🛠️ Tech Stack & Rationales

🚀 Setup

1. Environment Variables

2. Docker Setup

Run a Research Task via CLI

💻 Local Development

1. Setup Environment

2. Run Components

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sourcing Agent

🏗️ Architecture Overview

🔄 The Research Loop

📸 Walkthrough

1. Planning Stage

2. Initial Search

3. Adaptive Discovery Loop

4. Final Verification & Results

🛠️ Tech Stack & Rationales

🚀 Setup

1. Environment Variables

2. Docker Setup

Run a Research Task via CLI

💻 Local Development

1. Setup Environment

2. Run Components

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages