feat: scaffold deep research agent (frontend, backend, db)#1
Conversation
- Added Streamlit frontend - Added Temporal workflow skeleton - Added PostgreSQL layer with SQLAlchemy & Repository pattern - Added .gitignore rules
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughIntroduces a complete Deep Research Agent system with database persistence, Temporal workflow orchestration, worker execution, and Streamlit frontend. Includes configuration management, ORM-based data models, repository pattern for data access, and placeholder implementations of research activities. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Frontend as Frontend<br/>(Streamlit)
participant TemporalClient as Temporal<br/>(Client)
participant Orchestrator as Workflow<br/>(DeepResearchOrchestrator)
participant Worker as Worker<br/>(Executor)
participant Activities as Activities<br/>(search/fetch/extract)
participant Database as Database<br/>(SQLAlchemy)
User->>Frontend: Input topic & click Start
Frontend->>TemporalClient: Connect to Temporal Server
Frontend->>TemporalClient: Start DeepResearchOrchestrator
TemporalClient->>Orchestrator: Execute run(topic)
Orchestrator->>Orchestrator: Initialize ResearchState
Orchestrator->>Orchestrator: Plan hypothesis & next steps
Orchestrator->>Worker: Schedule search activity
Worker->>Activities: Execute search(query)
Activities-->>Worker: Return URLs
Worker->>Orchestrator: Activity result
Orchestrator->>Orchestrator: Update state & mark completed
Orchestrator-->>TemporalClient: Return ResearchState
TemporalClient-->>Frontend: Workflow ID & status
Frontend->>Frontend: Display Web UI link
Frontend-->>User: Show results or error
Worker->>Database: Persist state/entities/evidence
Database-->>Worker: Commit
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Fix all issues with AI agents
In @.agent/rules/api_endpoints.md:
- Around line 115-122: The documentation mentions non-existent tooling/paths
(make refresh_frontend_sdk, frontend/packages/api, llama-cloud, openapi.json);
update the section to either remove these references or mark them as "planned"
and provide accurate alternatives: remove the make target reference or add the
Makefile target implementation, correct the SDK path to frontend/ if that is the
intended location, remove or link to the actual Python SDK repo instead of
llama-cloud, and clarify how openapi.json is produced (or remove that step) so
the doc no longer describes infrastructure that isn't present.
- Around line 220-235: Update the example in .agent/rules/api_endpoints.md to
reflect the real codebase: replace references to get_parse_service(),
parse_service.v2.create_parse_job(), and the src/app/services_v2/ pattern with
the actual backend/Temporal architecture (or explicitly annotate the snippet as
"aspirational/future" if you want to keep the service-style example); ensure the
doc either shows how to invoke the Temporal workflow/activities (using the
backend worker/workflow names) or adds a clear note that get_parse_service() and
services_v2 are not implemented yet so readers won't expect those symbols to
exist.
In @.agent/rules/ops_stack.md:
- Line 13: Update the "**Caching**" guideline line that currently references
"pyproject.toml" and "uv.lock" to reflect the project's actual dependency file:
replace those names with "requirements.txt" (and, if applicable, the project's
lock file name) so the sentence reads e.g. "Copy requirements.txt (and
<lock-file>) and install dependencies before copying source code." Optionally
add a short note offering migration to pyproject.toml/uv or poetry as an
alternative path, but ensure the primary instruction matches current usage.
In @.agent/rules/python_stack.md:
- Around line 9-18: The nested markdown lists in .agent/rules/python_stack.md
(sections like "Tooling & Package Management", the "Commands" list and the
"Dependencies" sub-list including "Groups", "Extras", "Lock File") use 4-space
indents and must be reduced to 2-space indents to satisfy markdownlint rule
MD007; update each nested bullet to use 2 spaces per nesting level (e.g., change
the nested bullets under "Commands" and the nested items under "Dependencies" to
2-space indentation) and apply the same 2-space pattern to all other nested
lists in the file.
In @.agent/rules/services_v2.md:
- Around line 120-123: Update the phrase "Prevents API breaking changes" to use
a hyphenated compound—change it to "Prevents API-breaking changes" in the
.agent/rules/services_v2.md content (look for the sentence containing "Prevents
API breaking changes" near the "Public API schemas. **CRITICAL:** Literal +
Values only (NEVER Enums)." line).
- Around line 31-43: The fenced code block showing the directory tree lacks a
language tag which markdownlint flags; update the opening triple backticks to
include a language (e.g., change "```" to "```text") for the block that begins
with "services_v2/{service_name}/" so the snippet is marked as plain text and
the MD040 warning is resolved.
In `@backend/activities.py`:
- Around line 17-20: The extract_entities activity currently logs an empty
f-string and returns a hard-coded entity; update the function (extract_entities)
to actually use the content parameter and remove the unused f-string by
including meaningful info in the log (e.g., activity.logger.info(f"Extracting
entities from content: {content[:200]}") or similar) and implement extraction
logic that derives the returned list[dict] from content instead of always
returning [{"name": "Example Entity", "type": "Test"}]; ensure the content
parameter is referenced so the linter no longer flags it as unused.
In `@backend/storage.py`:
- Around line 80-97: In get_entity, eagerly load the evidence relationship by
adding selectinload(EntityModel.evidence) to the query (e.g., stmt =
select(EntityModel).options(selectinload(EntityModel.evidence)).where(...)),
then when db_obj is present map db_obj.evidence (EvidenceModel instances) into
EvidenceSnippet instances and populate the Entity(..., evidence=[...]) return
value instead of the empty list; update references to EntityModel, get_entity,
EvidenceModel and EvidenceSnippet when implementing the mapping.
- Around line 7-9: The save_entity function currently ignores evidence when
updating existing records and only inserts evidence for new entities; update the
existing-entity branch in save_entity to persist incoming evidence (e.g.,
append/merge the provided evidence objects into the existing entity.evidence
relationship and ensure the session adds/flushes/commits those changes) so
evidence updates are not dropped, and in get_entity replace the hardcoded
evidence=[] with loading the entity.evidence relationship (e.g., return or
populate evidence from the ORM relationship on the found Entity instance) so
stored evidence is returned; keep the new-entity evidence insertion logic but
make the update branch symmetric by using the Entity.evidence relationship and
AsyncSession operations used elsewhere in this module.
In `@backend/test_db.py`:
- Around line 8-31: The in-memory SQLite setup is unstable because
create_async_engine currently uses the default pool; update backend/test_db.py
to force a single shared connection by importing StaticPool from sqlalchemy.pool
and calling create_async_engine(TEST_DB_URL, echo=False,
connect_args={"check_same_thread": False}, poolclass=StaticPool) (keep
TEST_DB_URL as "sqlite+aiosqlite:///:memory:"), so the engine variable uses a
StaticPool and sessions from async_sessionmaker see the tables created in
engine.begin(); ensure the import for StaticPool is added and no other pool
settings conflict.
In `@frontend/app.py`:
- Around line 63-65: The link uses os.getenv("TEMPORAL_NAMESPACE") which can be
None; update the code that builds the Temporal Web UI URL (the namespace
variable and url construction where workflow_id is used) to fallback to a sane
default (e.g., "default") when TEMPORAL_NAMESPACE is unset (use os.getenv with a
default or a conditional) so the rendered link never contains "None".
- Around line 6-11: The workflow_id is currently taken from raw user input which
can contain spaces/special chars and cause Temporal errors or collisions; change
the workflow ID creation (the variable used where you pass workflow_id into
Client.start) to a sanitized slug of the user input plus a short UUID suffix
(use a slugify function or regex to keep only alphanumerics/hyphens and append
uuid4 hex) to ensure uniqueness and valid characters. Also make namespace
handling consistent between get_client() and the URL builder: centralize the
namespace default (e.g., read os.getenv("TEMPORAL_NAMESPACE", "<your-default>")
once or pull from get_client() config) and use that same value when constructing
the Temporal host/URL so the URL is not malformed when the env var is unset;
update references in get_client() and the URL construction code to use that
single namespace variable.
In `@requirements.txt`:
- Around line 1-7: Update requirements.txt to pin exact package versions for
temporalio, streamlit, pydantic, python-dotenv, sqlalchemy[asyncio], asyncpg,
and aiosqlite (e.g., generate via pip freeze or use pip-compile with
requirements.in -> requirements.txt) to ensure reproducible installs;
concurrently track the protobuf advisory (GHSA-7gcm-g887-7qv7) and plan to
upgrade temporalio/streamlit when a patched protobuf is released; meanwhile
mitigate exposure by avoiding json_format.ParseDict() on untrusted JSON
containing nested Any messages or add application-level depth/structure
validation for any protobuf parsing.
🧹 Nitpick comments (4)
.agent/rules/python_env.md (1)
1-27: Consider adding virtual environment creation instructions.The rule clearly documents how to activate the environment but doesn't mention creation. Consider adding a brief note about initial setup:
💡 Optional enhancement
Add before line 14:
## Initial Setup If the environment doesn't exist yet, create it first: ```bash python -m venv envGuidelines for Execution
This helps new contributors who may not have the `env/` directory yet. </details> </blockquote></details> <details> <summary>backend/config.py (1)</summary><blockquote> `6-9`: **Allow TASK_QUEUE to be overridden via environment** Hardcoding the queue makes multi-env deployments harder; align with the other config values by reading it from env with a default. <details> <summary>♻️ Proposed change</summary> ```diff -TASK_QUEUE = "deep-research-queue" +TASK_QUEUE = os.getenv("TASK_QUEUE", "deep-research-queue")backend/worker.py (1)
15-30: TLS configuration should be explicitly configurable and use proper SDK conventionsTLS is currently only enabled when
TEMPORAL_API_KEYis set, preventing TLS-only self-hosted clusters without API keys from connecting. More importantly, the suggestedTLSConfig()without arguments is incorrect for API key auth; the Temporal Python SDK expectstls=True(boolean) for API key connections andTLSConfig(client_cert=..., client_private_key=...)for mTLS.Refactor to allow explicit TLS configuration:
🔧 Suggested implementation
api_key = os.getenv("TEMPORAL_API_KEY") - tls_config = None - if api_key: - # If API key is present, we likely need TLS (Cloud) - tls_config = TLSConfig() + use_tls = os.getenv("TEMPORAL_USE_TLS", "").lower() in {"1", "true", "yes"} + tls = True if (api_key or use_tls) else NoneThen update
Client.connect()to passtls=tlsinstead oftls=tls_config. For mTLS with certificates, useTLSConfig(client_cert=..., client_private_key=...)explicitly.backend/database.py (1)
27-30: Add rollback on errors to keep sessions clean.
If an exception happens mid-request, a rollback prevents broken transactions from leaking into the pool.Proposed fix
async def get_db(): """Dependency for getting async DB session.""" async with AsyncSessionLocal() as session: - yield session + try: + yield session + except Exception: + await session.rollback() + raise
.agent/rules/ops_stack.md
Outdated
| - **Base Images**: Use slim/alpine variants (e.g., `python:3.11-slim`). Fix versions, avoid `latest`. | ||
| - **Multi-Stage**: Use multi-stage builds (`builder` -> `runtime`) to minimize image size. | ||
| - **User**: Run containers as a non-root user (create `appuser`). | ||
| - **Caching**: Copy `pyproject.toml` and `uv.lock` and install dependencies *before* copying source code. |
There was a problem hiding this comment.
Align guideline with actual project setup.
The guideline references pyproject.toml and uv.lock, but the project uses requirements.txt for dependency management. This inconsistency could confuse developers following these guidelines.
📝 Suggested fix
Option 1 - Update to match current setup:
-- **Caching**: Copy `pyproject.toml` and `uv.lock` and install dependencies *before* copying source code.
+- **Caching**: Copy `requirements.txt` and install dependencies *before* copying source code.Option 2 - Migrate to modern tooling (recommended for new projects):
Consider adopting pyproject.toml with a tool like uv or poetry for better dependency management, then update requirements.txt accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Caching**: Copy `pyproject.toml` and `uv.lock` and install dependencies *before* copying source code. | |
| - **Caching**: Copy `requirements.txt` and install dependencies *before* copying source code. |
🤖 Prompt for AI Agents
In @.agent/rules/ops_stack.md at line 13, Update the "**Caching**" guideline
line that currently references "pyproject.toml" and "uv.lock" to reflect the
project's actual dependency file: replace those names with "requirements.txt"
(and, if applicable, the project's lock file name) so the sentence reads e.g.
"Copy requirements.txt (and <lock-file>) and install dependencies before copying
source code." Optionally add a short note offering migration to
pyproject.toml/uv or poetry as an alternative path, but ensure the primary
instruction matches current usage.
.agent/rules/python_stack.md
Outdated
| ## 1. Tooling & Package Management | ||
| - **Manager**: Use `uv` for all package and project management. | ||
| - **Commands**: | ||
| - Install/Sync: `uv sync` | ||
| - Run Tests: `uv run -- pytest` | ||
| - Linting: `uv run make lint` | ||
| - **Dependencies**: Managed in `pyproject.toml` (via `uv`). | ||
| - **Groups**: Use dependency groups (`dev`, `docs`) for development tools. | ||
| - **Extras**: Use optional dependencies (`extras`) for integrations/features. | ||
| - **Lock File**: Ensure `uv.lock` is up to date. |
There was a problem hiding this comment.
Fix nested list indentation to satisfy markdownlint (MD007)
Nested bullets use 4 spaces; markdownlint expects 2. Apply this pattern to the other nested lists in the file as well.
✍️ Suggested fix (representative block)
- - Install/Sync: `uv sync`
- - Run Tests: `uv run -- pytest`
- - Linting: `uv run make lint`
+ - Install/Sync: `uv sync`
+ - Run Tests: `uv run -- pytest`
+ - Linting: `uv run make lint`
...
- - **Groups**: Use dependency groups (`dev`, `docs`) for development tools.
- - **Extras**: Use optional dependencies (`extras`) for integrations/features.
- - **Lock File**: Ensure `uv.lock` is up to date.
+ - **Groups**: Use dependency groups (`dev`, `docs`) for development tools.
+ - **Extras**: Use optional dependencies (`extras`) for integrations/features.
+ - **Lock File**: Ensure `uv.lock` is up to date.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## 1. Tooling & Package Management | |
| - **Manager**: Use `uv` for all package and project management. | |
| - **Commands**: | |
| - Install/Sync: `uv sync` | |
| - Run Tests: `uv run -- pytest` | |
| - Linting: `uv run make lint` | |
| - **Dependencies**: Managed in `pyproject.toml` (via `uv`). | |
| - **Groups**: Use dependency groups (`dev`, `docs`) for development tools. | |
| - **Extras**: Use optional dependencies (`extras`) for integrations/features. | |
| - **Lock File**: Ensure `uv.lock` is up to date. | |
| ## 1. Tooling & Package Management | |
| - **Manager**: Use `uv` for all package and project management. | |
| - **Commands**: | |
| - Install/Sync: `uv sync` | |
| - Run Tests: `uv run -- pytest` | |
| - Linting: `uv run make lint` | |
| - **Dependencies**: Managed in `pyproject.toml` (via `uv`). | |
| - **Groups**: Use dependency groups (`dev`, `docs`) for development tools. | |
| - **Extras**: Use optional dependencies (`extras`) for integrations/features. | |
| - **Lock File**: Ensure `uv.lock` is up to date. |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
16-16: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🤖 Prompt for AI Agents
In @.agent/rules/python_stack.md around lines 9 - 18, The nested markdown lists
in .agent/rules/python_stack.md (sections like "Tooling & Package Management",
the "Commands" list and the "Dependencies" sub-list including "Groups",
"Extras", "Lock File") use 4-space indents and must be reduced to 2-space
indents to satisfy markdownlint rule MD007; update each nested bullet to use 2
spaces per nesting level (e.g., change the nested bullets under "Commands" and
the nested items under "Dependencies" to 2-space indentation) and apply the same
2-space pattern to all other nested lists in the file.
.agent/rules/services_v2.md
Outdated
| ``` | ||
| services_v2/{service_name}/ | ||
| ├── crud.py # Single table | ||
| ├── crud/ # OR: Multiple tables | ||
| │ ├── entity_a_crud.py | ||
| │ └── entity_b_crud.py | ||
| ├── service.py # Business logic (REQUIRED) | ||
| ├── schema.py # Internal schemas (REQUIRED) | ||
| ├── types.py # Shared Literal + Values (optional, see parse/types.py) | ||
| ├── api_schema.py # Public API schemas (optional) | ||
| ├── api_utils.py # Schema conversions (optional) | ||
| └── utils.py # Helpers (optional) | ||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced code block (MD040)
The directory structure block lacks a language specifier, which markdownlint flags.
✍️ Suggested fix
-```
+```text
services_v2/{service_name}/
├── crud.py # Single table
├── crud/ # OR: Multiple tables
│ ├── entity_a_crud.py
│ └── entity_b_crud.py
├── service.py # Business logic (REQUIRED)
├── schema.py # Internal schemas (REQUIRED)
├── types.py # Shared Literal + Values (optional, see parse/types.py)
├── api_schema.py # Public API schemas (optional)
├── api_utils.py # Schema conversions (optional)
└── utils.py # Helpers (optional)</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.20.0)</summary>
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @.agent/rules/services_v2.md around lines 31 - 43, The fenced code block
showing the directory tree lacks a language tag which markdownlint flags; update
the opening triple backticks to include a language (e.g., change "" to "text") for the block that begins with "services_v2/{service_name}/" so the
snippet is marked as plain text and the MD040 warning is resolved.
</details>
<!-- fingerprinting:phantom:medusa:eagle -->
<!-- This is an auto-generated comment by CodeRabbit -->
- Reorganized backend into db/ and research/ sub-packages - Implemented eager loading for evidence relationship - Sanitized workflow IDs and centralized namespace handling - Pinned dependency versions in requirements.txt - Removed .agent/ from git tracking
@coderabbitai
Summary by CodeRabbit
Release Notes
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.