Skip to content

Comments

Fix postgres migrations#433

Merged
KaQuMiQ merged 1 commit intomainfrom
feature/migrations
Oct 1, 2025
Merged

Fix postgres migrations#433
KaQuMiQ merged 1 commit intomainfrom
feature/migrations

Conversation

@KaQuMiQ
Copy link
Collaborator

@KaQuMiQ KaQuMiQ commented Oct 1, 2025

No description provided.

@coderabbitai
Copy link

coderabbitai bot commented Oct 1, 2025

Warning

Rate limit exceeded

@KaQuMiQ has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 14 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 26b16b5 and 24ab898.

📒 Files selected for processing (10)
  • docs/guides/EvaluatorCatalog.md (8 hunks)
  • docs/guides/Postgres.md (1 hunks)
  • mkdocs.yml (1 hunks)
  • pyproject.toml (1 hunks)
  • src/draive/helpers/instruction_preparation.py (4 hunks)
  • src/draive/postgres/configuration.py (5 hunks)
  • src/draive/postgres/instructions.py (5 hunks)
  • src/draive/postgres/memory.py (9 hunks)
  • tests/test_specification.py (2 hunks)
  • tools/generate_instruction.py (5 hunks)

Walkthrough

  • pyproject.toml: Version bumped 0.87.0 → 0.87.1.
  • src/draive/postgres: configuration.py, instructions.py, memory.py — SQL schemas updated to include composite PRIMARY KEYs and created timestamps; SELECTs and INSERTs now use explicit casts (::TEXT, ::JSONB) and parameter placeholders casted; memory persistence split into memories / memories_variables / memories_elements; connection handling switched to explicit acquire_connection + connection.transaction(); null/JSON fallbacks and caching decorator added in places.
  • src/draive/helpers/instruction_preparation.py: guideline injection refactored to use _format_guidelines and escaping helpers; template placeholders adjusted to preserve curly-brace escaping.
  • tools/generate_instruction.py: added ProviderSpec/PROVIDERS registry, dynamic provider parsing, CLI changes, and centralized provider-scoped orchestration.
  • docs: Added guides/Postgres.md and linked in mkdocs.yml; EvaluatorCatalog.md updated to use Sequence[str] for target_keywords.
  • tests: Minor string-quote changes in type hints in tests/test_specification.py.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Update evals interface #353 — Modifies src/draive/helpers/instruction_preparation.py; overlaps with the template/escaping and instruction formatting changes in this diff.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title “Fix postgres migrations” focuses narrowly on a migrations fix, but the pull request actually spans a version bump, multiple Postgres schema and query updates, memory and instruction repository changes, documentation additions, test adjustments, and tooling refactors, so it does not clearly summarize the primary scope of the changeset. Please revise the title to accurately convey the breadth of this changeset—such as “Refactor Postgres repositories, update memory and instruction modules, and bump version”—or consider splitting unrelated updates into separate pull requests.
Docstring Coverage ⚠️ Warning Docstring coverage is 13.04% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ❓ Inconclusive No pull request description was provided, leaving reviewers without context for the extensive changes across configuration, repositories, documentation, tests, and tooling, so the description is too vague to assess the intent or scope of this work. Please add a concise description summarizing the objectives and main changes of this pull request to help reviewers understand its purpose and impact.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/draive/postgres/memory.py (1)

95-114: Fix inconsistent connection usage.

Line 99 uses PostgresConnection.execute() while line 118 (same function) uses connection.execute(). Within the explicit connection context, all execute calls should use the connection instance to ensure they run within the transaction.

Apply this diff to fix the inconsistency:

                if variables is not None:
                    ctx.log_info(f"...remembering {len(variables)} variables...")
-                    await PostgresConnection.execute(
+                    await connection.execute(
                        """
                        INSERT INTO
                            memories_variables (
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 410b716 and 6a743c2.

📒 Files selected for processing (4)
  • pyproject.toml (1 hunks)
  • src/draive/postgres/configuration.py (5 hunks)
  • src/draive/postgres/instructions.py (5 hunks)
  • src/draive/postgres/memory.py (8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports from the draive package (e.g., from draive import ...), not relative imports
Follow Ruff import ordering: standard library, third-party, then local imports
Use Python 3.12+ typing features (e.g., | unions, PEP 695 generics)
Prefer abstract/base collection types (Sequence, Mapping, Set, Iterable) over concrete list, dict, set in type hints
Define and raise custom exception types for specific error cases
Use NumPy docstring convention for all functions, classes, and methods
Skip module-level docstrings unless explicitly requested
Docstrings should include Parameters, Returns, Raises, and Notes (if needed) sections

Format all Python code with Ruff formatter (run via make format); do not use any other formatter

Files:

  • src/draive/postgres/instructions.py
  • src/draive/postgres/configuration.py
  • src/draive/postgres/memory.py
src/draive/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/draive/**/*.py: Import Haiway symbols directly as from haiway import State, ctx
Use ctx.scope(...) to bind active State instances; avoid global state
Route all logging through ctx.log_* and never use print
Enforce strict typing on public APIs; avoid untyped definitions and Any except at verified third‑party boundaries
Prefer explicit attribute access with static types; avoid dynamic getattr except at narrow boundaries
Use abstract immutable protocols (Mapping, Sequence, Iterable) instead of concrete dict/list/set in public types
Use final where applicable; prefer composition over complex inheritance
Use precise unions (|), narrow with match/isinstance, and avoid cast unless provably safe and localized
Model immutable configuration/data as haiway.State; avoid in‑place mutation and prefer State.updated(...)/functional builders
Access active state via haiway.ctx within async scopes (ctx.scope(...))
Public state methods that dispatch on the active instance must use @statemethod
Add structured, concise logs around generation calls, tool dispatch, and provider requests/responses (without secrets); add metrics via ctx.record where applicable
All I/O is async; keep boundaries async, use ctx.spawn for detached tasks, and avoid custom threading
Translate provider/SDK errors into typed exceptions; never raise bare Exception; preserve context
Compose multimodal content with MultimodalContent.of(...); use ResourceContent/Reference for blobs; wrap custom data in ArtifactContent (use hidden when needed)
Public symbols require NumPy-style docstrings with Parameters/Returns/Raises and rationale when non-obvious
Avoid docstrings for internal helpers; prefer self-explanatory names; skip module-level docstrings
Never log secrets or full request bodies containing keys/tokens
Read credentials from environment variables; resolve via helpers like getenv_str

Files:

  • src/draive/postgres/instructions.py
  • src/draive/postgres/configuration.py
  • src/draive/postgres/memory.py
🧬 Code graph analysis (2)
src/draive/postgres/instructions.py (1)
src/draive/parameters/model.py (1)
  • from_json (705-720)
src/draive/postgres/memory.py (1)
src/draive/stages/stage.py (1)
  • execute (1978-2042)
🔇 Additional comments (11)
pyproject.toml (1)

8-8: LGTM! Appropriate patch version bump.

The version increment from 0.87.0 to 0.87.1 correctly follows semantic versioning for bug fixes related to postgres migrations.

src/draive/postgres/configuration.py (4)

40-43: LGTM! Cache configuration is optimal.

The limit=1 is correct here since listing() has no varying parameters, meaning there's only one possible cache key.


48-61: LGTM! Explicit type casting improves robustness.

The identifier::TEXT cast ensures consistent text handling and aligns with postgres best practices for explicit type conversion.


74-103: LGTM! Consistent text-based JSON handling.

Casting content::TEXT before json.loads() is the correct pattern for safely handling JSONB data from postgres.


111-126: LGTM! Explicit parameter typing in INSERT.

Using $1::TEXT and $2::JSONB with json.dumps() ensures type safety and correct JSONB storage.

src/draive/postgres/instructions.py (2)

57-86: LGTM! Defensive parsing with fallback values.

The explicit TEXT casts combined with fallback values ("{}" for meta, "[]" for arguments) provide robust handling of NULL or missing data.


138-163: LGTM! Explicit typing and proper NULL handling.

The explicit parameter casts and declaration.description or None ensure proper NULL insertion for missing descriptions rather than empty strings.

src/draive/postgres/memory.py (4)

14-33: Schema normalization looks appropriate.

The split into memories, memories_variables, and memories_elements with foreign key references provides a clean normalized structure. The absence of primary keys on the child tables allows multiple snapshots over time, which appears intentional.


145-162: LGTM! Defensive FK reference initialization.

The ON CONFLICT DO NOTHING pattern ensures the base memory record exists before inserting child records, preventing FK violations. This is the correct approach for this normalized schema.


192-246: LGTM! Consistent TEXT casting for JSON retrieval.

The content::TEXT cast enables safe json.loads() decoding. The reversal logic correctly restores chronological order after the DESC sort.


266-294: LGTM! Correct latest-snapshot retrieval.

The DISTINCT ON (identifier) with ORDER BY created DESC correctly retrieves the most recent variables snapshot. The variables::TEXT cast enables safe JSON decoding.

@KaQuMiQ KaQuMiQ force-pushed the feature/migrations branch 2 times, most recently from 99e6373 to b11fcf0 Compare October 1, 2025 11:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/draive/postgres/memory.py (1)

95-134: Fix missing base row insert and reuse the scoped connection.

The first remember() call inserts into memories_elements without guaranteeing a parent row exists in memories, so Line 118 will hit a foreign-key violation. In the same block (and again in maintenance()), PostgresConnection.execute(...) opens a separate connection, so any failure rolls back the child rows but leaves the parent write committed—this is the same bug previously highlighted. Please ensure the root row is created inside the current transaction and issue all inserts through the acquired connection.

@@
-        async with Postgres.acquire_connection() as connection:
-            async with connection.transaction():
-                if variables is not None:
+        async with Postgres.acquire_connection() as connection:
+            async with connection.transaction():
+                ctx.log_info("...ensuring memory entry exists...")
+                await connection.execute(
+                    """
+                    INSERT INTO
+                        memories (
+                            identifier
+                        )
+
+                    VALUES (
+                        $1::TEXT
+                    )
+
+                    ON CONFLICT (identifier)
+                    DO NOTHING;
+                    """,
+                    identifier,
+                )
+
+                if variables is not None:
                     ctx.log_info(f"...remembering {len(variables)} variables...")
-                    await PostgresConnection.execute(
+                    await connection.execute(
@@
-                if variables is not None:
-                    ctx.log_info(f"...remembering {len(variables)} variables...")
-                    await PostgresConnection.execute(
+                if variables is not None:
+                    ctx.log_info(f"...remembering {len(variables)} variables...")
+                    await connection.execute(

Also applies to: 165-181

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99e6373 and b11fcf0.

📒 Files selected for processing (9)
  • docs/guides/EvaluatorCatalog.md (8 hunks)
  • docs/guides/Postgres.md (1 hunks)
  • mkdocs.yml (1 hunks)
  • pyproject.toml (1 hunks)
  • src/draive/postgres/configuration.py (5 hunks)
  • src/draive/postgres/instructions.py (5 hunks)
  • src/draive/postgres/memory.py (8 hunks)
  • tests/test_specification.py (2 hunks)
  • tools/generate_instruction.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
mkdocs.yml

📄 CodeRabbit inference engine (AGENTS.md)

Register documentation navigation in mkdocs.yml under the nav: section when adding pages

Files:

  • mkdocs.yml
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports from the draive package (e.g., from draive import ...), not relative imports
Follow Ruff import ordering: standard library, third-party, then local imports
Use Python 3.12+ typing features (e.g., | unions, PEP 695 generics)
Prefer abstract/base collection types (Sequence, Mapping, Set, Iterable) over concrete list, dict, set in type hints
Define and raise custom exception types for specific error cases
Use NumPy docstring convention for all functions, classes, and methods
Skip module-level docstrings unless explicitly requested
Docstrings should include Parameters, Returns, Raises, and Notes (if needed) sections

Format all Python code with Ruff formatter (run via make format); do not use any other formatter

Files:

  • tests/test_specification.py
  • src/draive/postgres/instructions.py
  • tools/generate_instruction.py
  • src/draive/postgres/configuration.py
  • src/draive/postgres/memory.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Place tests in the tests/ directory and use pytest (with async support)
Mark asynchronous tests with pytest.mark.asyncio
Mock dependencies within test scope using stubbed functionality state (e.g., via ctx.scope)

tests/**/*.py: Unit tests must not perform network I/O; mock providers/HTTP
Use pytest-asyncio (@pytest.mark.asyncio) for coroutine tests
Prefer scoping with ctx.scope(...) in tests and bind required State instances explicitly
Avoid real I/O in tests; stub provider calls and HTTP

Files:

  • tests/test_specification.py
src/draive/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/draive/**/*.py: Import Haiway symbols directly as from haiway import State, ctx
Use ctx.scope(...) to bind active State instances; avoid global state
Route all logging through ctx.log_* and never use print
Enforce strict typing on public APIs; avoid untyped definitions and Any except at verified third‑party boundaries
Prefer explicit attribute access with static types; avoid dynamic getattr except at narrow boundaries
Use abstract immutable protocols (Mapping, Sequence, Iterable) instead of concrete dict/list/set in public types
Use final where applicable; prefer composition over complex inheritance
Use precise unions (|), narrow with match/isinstance, and avoid cast unless provably safe and localized
Model immutable configuration/data as haiway.State; avoid in‑place mutation and prefer State.updated(...)/functional builders
Access active state via haiway.ctx within async scopes (ctx.scope(...))
Public state methods that dispatch on the active instance must use @statemethod
Add structured, concise logs around generation calls, tool dispatch, and provider requests/responses (without secrets); add metrics via ctx.record where applicable
All I/O is async; keep boundaries async, use ctx.spawn for detached tasks, and avoid custom threading
Translate provider/SDK errors into typed exceptions; never raise bare Exception; preserve context
Compose multimodal content with MultimodalContent.of(...); use ResourceContent/Reference for blobs; wrap custom data in ArtifactContent (use hidden when needed)
Public symbols require NumPy-style docstrings with Parameters/Returns/Raises and rationale when non-obvious
Avoid docstrings for internal helpers; prefer self-explanatory names; skip module-level docstrings
Never log secrets or full request bodies containing keys/tokens
Read credentials from environment variables; resolve via helpers like getenv_str

Files:

  • src/draive/postgres/instructions.py
  • src/draive/postgres/configuration.py
  • src/draive/postgres/memory.py
docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Author documentation pages belong under docs/

Files:

  • docs/guides/EvaluatorCatalog.md
  • docs/guides/Postgres.md
🧬 Code graph analysis (3)
src/draive/postgres/instructions.py (1)
src/draive/parameters/model.py (1)
  • from_json (705-720)
tools/generate_instruction.py (7)
src/draive/anthropic/client.py (1)
  • Anthropic (15-90)
src/draive/anthropic/config.py (1)
  • AnthropicConfig (9-21)
src/draive/gemini/client.py (1)
  • Gemini (18-66)
src/draive/gemini/config.py (1)
  • GeminiConfig (25-36)
src/draive/helpers/instruction_preparation.py (2)
  • InstructionPreparationAmbiguity (15-22)
  • prepare_instructions (25-64)
src/draive/openai/client.py (1)
  • OpenAI (22-99)
src/draive/openai/config.py (1)
  • OpenAIResponsesConfig (21-32)
src/draive/postgres/memory.py (1)
src/draive/stages/stage.py (1)
  • execute (1978-2042)
🔇 Additional comments (11)
pyproject.toml (1)

8-8: LGTM! Patch version bump for postgres migrations fix.

The version bump from 0.87.0 to 0.87.1 appropriately reflects the patch nature of the postgres migration fixes included in this PR.

tests/test_specification.py (1)

259-259: LGTM! Cosmetic quote style normalization.

The change from single-quoted to double-quoted forward references is purely stylistic and has no functional impact.

Also applies to: 292-293

src/draive/postgres/configuration.py (2)

15-16: LGTM! Composite primary key enables versioned configuration storage.

The updated schema with PRIMARY KEY (identifier, created) allows storing multiple versions of each configuration, which aligns well with the DISTINCT ON (identifier) queries that retrieve the latest version.


51-51: LGTM! Explicit type casts improve query safety.

The explicit ::TEXT and ::JSONB casts in SELECT and INSERT statements prevent implicit type conversion issues and make the query contract clearer. This is a PostgreSQL best practice.

Also applies to: 77-78, 120-121

docs/guides/EvaluatorCatalog.md (1)

398-398: LGTM! Type hint updated to follow best practices.

The change from list[str] to Sequence[str] aligns with the coding guidelines requiring abstract collection types in public APIs. This promotes better API design by accepting any sequence-like input.

As per coding guidelines.

Also applies to: 416-416

tools/generate_instruction.py (6)

77-89: LGTM! Well-designed provider abstraction.

The ProviderSpec frozen dataclass effectively encapsulates provider-specific configuration and disposable creation. The use of frozen=True ensures immutability, and the factory methods provide a clean interface for creating provider contexts.


92-114: LGTM! Provider registry enables extensible configuration.

The PROVIDERS registry cleanly maps provider keys to their specifications. Default models are consistent with the documentation comments (lines 46-48), and the factory lambdas correctly instantiate provider-specific config classes.


117-125: LGTM! Robust provider validation with helpful error messages.

The _parse_provider function normalizes input, validates against the registry, and provides clear error messages listing available providers when an invalid choice is given.


128-140: LGTM! Provider-agnostic instruction generation.

The refactored generate_with_provider function eliminates provider-specific branching by using the ProviderSpec abstraction. The guidelines parameter is correctly propagated to prepare_instructions, and the context scope correctly initializes both config and disposable.


174-177: LGTM! Guidelines support added to CLI.

The --guidelines argument enables users to inject custom guidelines during instruction preparation, which is correctly propagated to the prepare_instructions call.


221-224: LGTM! Proper handling of ambiguous instruction preparation.

The new exception handling for InstructionPreparationAmbiguity correctly writes follow-up questions to stderr and exits with code 2, distinguishing this case from general errors (exit code 1).

@KaQuMiQ KaQuMiQ force-pushed the feature/migrations branch from b11fcf0 to 26b16b5 Compare October 1, 2025 11:53
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/draive/postgres/configuration.py (1)

69-103: Verify loading query handles composite key correctly.

The loading query fetches the latest configuration for a given identifier using DISTINCT ON (identifier) and ORDER BY identifier, created DESC. This should work correctly, but consider whether LIMIT 1 is still necessary given DISTINCT ON already returns one row per identifier.

Consider simplifying the query by removing the redundant LIMIT 1:

         loaded: PostgresRow | None = await Postgres.fetch_one(
             """
             SELECT DISTINCT ON (identifier)
                 identifier::TEXT,
                 content::TEXT
 
             FROM
                 configurations
 
             WHERE
                 identifier = $1
 
             ORDER BY
                 identifier,
                 created
             DESC
-
-            LIMIT 1;
+            ;
             """,
             identifier,
         )
src/draive/postgres/memory.py (1)

99-114: Critical: Inconsistent connection usage in remember function.

Line 99 calls PostgresConnection.execute() instead of using the acquired connection.execute(). This bypasses the explicit connection and transaction context, potentially causing the operation to run outside the transaction or on a different connection.

Apply this diff to use the acquired connection:

                if variables is not None:
                    ctx.log_info(f"...remembering {len(variables)} variables...")
-                    await PostgresConnection.execute(
+                    await connection.execute(
                        """
                        INSERT INTO
                            memories_variables (
src/draive/postgres/instructions.py (2)

97-113: Remove redundant LIMIT 1 clause.

DISTINCT ON (name) with ORDER BY name, created DESC already returns exactly one row per name (the most recent). The LIMIT 1 on line 113 is redundant.

             ORDER BY
                 name,
                 created
             DESC
-
-            LIMIT 1;
+            ;

97-113: Add explicit ::TEXT cast for consistency.

The content column in the SELECT should be cast to ::TEXT to match the explicit casting pattern used in the listing function (lines 60-63).

             SELECT DISTINCT ON (name)
-                content
+                content::TEXT
 
             FROM
♻️ Duplicate comments (3)
src/draive/helpers/instruction_preparation.py (1)

189-189: Template example placeholder escaping needs clarification.

Line 189 changes {{topic}} to {{{{topic}}}}. Same concern as above—verify this is the intended user-facing documentation syntax.

src/draive/postgres/memory.py (1)

164-181: Critical: Inconsistent connection usage in maintenance function - same issue as flagged in past review.

Line 166 calls PostgresConnection.execute() instead of connection.execute(). This is the exact issue flagged in the past review comment. All database operations within the explicit connection/transaction context should use the acquired connection instance.

Apply this diff to fix the inconsistency:

                if variables is not None:
                    ctx.log_info(f"...remembering {len(variables)} variables...")
-                    await PostgresConnection.execute(
+                    await connection.execute(
                        """
                        INSERT INTO
                            memories_variables (

Based on past review comments.

docs/guides/Postgres.md (1)

21-24: Fix typo in inline comment.

The inline comment contains "confgiurations" which should be "configurations".

-    PostgresConfigurationRepository(), # use postgres confgiurations
+    PostgresConfigurationRepository(),  # use postgres configurations
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b11fcf0 and 26b16b5.

📒 Files selected for processing (10)
  • docs/guides/EvaluatorCatalog.md (8 hunks)
  • docs/guides/Postgres.md (1 hunks)
  • mkdocs.yml (1 hunks)
  • pyproject.toml (1 hunks)
  • src/draive/helpers/instruction_preparation.py (4 hunks)
  • src/draive/postgres/configuration.py (5 hunks)
  • src/draive/postgres/instructions.py (5 hunks)
  • src/draive/postgres/memory.py (8 hunks)
  • tests/test_specification.py (2 hunks)
  • tools/generate_instruction.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports from the draive package (e.g., from draive import ...), not relative imports
Follow Ruff import ordering: standard library, third-party, then local imports
Use Python 3.12+ typing features (e.g., | unions, PEP 695 generics)
Prefer abstract/base collection types (Sequence, Mapping, Set, Iterable) over concrete list, dict, set in type hints
Define and raise custom exception types for specific error cases
Use NumPy docstring convention for all functions, classes, and methods
Skip module-level docstrings unless explicitly requested
Docstrings should include Parameters, Returns, Raises, and Notes (if needed) sections

Format all Python code with Ruff formatter (run via make format); do not use any other formatter

Files:

  • src/draive/postgres/configuration.py
  • tools/generate_instruction.py
  • src/draive/postgres/memory.py
  • src/draive/postgres/instructions.py
  • src/draive/helpers/instruction_preparation.py
  • tests/test_specification.py
src/draive/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/draive/**/*.py: Import Haiway symbols directly as from haiway import State, ctx
Use ctx.scope(...) to bind active State instances; avoid global state
Route all logging through ctx.log_* and never use print
Enforce strict typing on public APIs; avoid untyped definitions and Any except at verified third‑party boundaries
Prefer explicit attribute access with static types; avoid dynamic getattr except at narrow boundaries
Use abstract immutable protocols (Mapping, Sequence, Iterable) instead of concrete dict/list/set in public types
Use final where applicable; prefer composition over complex inheritance
Use precise unions (|), narrow with match/isinstance, and avoid cast unless provably safe and localized
Model immutable configuration/data as haiway.State; avoid in‑place mutation and prefer State.updated(...)/functional builders
Access active state via haiway.ctx within async scopes (ctx.scope(...))
Public state methods that dispatch on the active instance must use @statemethod
Add structured, concise logs around generation calls, tool dispatch, and provider requests/responses (without secrets); add metrics via ctx.record where applicable
All I/O is async; keep boundaries async, use ctx.spawn for detached tasks, and avoid custom threading
Translate provider/SDK errors into typed exceptions; never raise bare Exception; preserve context
Compose multimodal content with MultimodalContent.of(...); use ResourceContent/Reference for blobs; wrap custom data in ArtifactContent (use hidden when needed)
Public symbols require NumPy-style docstrings with Parameters/Returns/Raises and rationale when non-obvious
Avoid docstrings for internal helpers; prefer self-explanatory names; skip module-level docstrings
Never log secrets or full request bodies containing keys/tokens
Read credentials from environment variables; resolve via helpers like getenv_str

Files:

  • src/draive/postgres/configuration.py
  • src/draive/postgres/memory.py
  • src/draive/postgres/instructions.py
  • src/draive/helpers/instruction_preparation.py
mkdocs.yml

📄 CodeRabbit inference engine (AGENTS.md)

Register documentation navigation in mkdocs.yml under the nav: section when adding pages

Files:

  • mkdocs.yml
docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Author documentation pages belong under docs/

Files:

  • docs/guides/EvaluatorCatalog.md
  • docs/guides/Postgres.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Place tests in the tests/ directory and use pytest (with async support)
Mark asynchronous tests with pytest.mark.asyncio
Mock dependencies within test scope using stubbed functionality state (e.g., via ctx.scope)

tests/**/*.py: Unit tests must not perform network I/O; mock providers/HTTP
Use pytest-asyncio (@pytest.mark.asyncio) for coroutine tests
Prefer scoping with ctx.scope(...) in tests and bind required State instances explicitly
Avoid real I/O in tests; stub provider calls and HTTP

Files:

  • tests/test_specification.py
🧬 Code graph analysis (3)
tools/generate_instruction.py (7)
src/draive/anthropic/client.py (1)
  • Anthropic (15-90)
src/draive/anthropic/config.py (1)
  • AnthropicConfig (9-21)
src/draive/gemini/client.py (1)
  • Gemini (18-66)
src/draive/gemini/config.py (1)
  • GeminiConfig (25-36)
src/draive/helpers/instruction_preparation.py (2)
  • InstructionPreparationAmbiguity (15-22)
  • prepare_instructions (25-65)
src/draive/openai/client.py (1)
  • OpenAI (22-99)
src/draive/openai/config.py (1)
  • OpenAIResponsesConfig (21-32)
src/draive/postgres/memory.py (1)
src/draive/stages/stage.py (1)
  • execute (1978-2042)
src/draive/postgres/instructions.py (1)
src/draive/parameters/model.py (1)
  • from_json (705-720)
🔇 Additional comments (26)
tools/generate_instruction.py (4)

77-114: LGTM! Clean provider abstraction pattern.

The ProviderSpec dataclass and PROVIDERS registry provide a maintainable, extensible way to support multiple AI providers. The separation of config/disposable factories follows the draive pattern shown in the relevant snippets.


128-140: Verify that prepare_instructions accepts guidelines parameter.

The generate_with_provider function now passes guidelines=guidelines to prepare_instructions. Ensure this parameter is supported by the function signature.

Based on the relevant snippets from src/draive/helpers/instruction_preparation.py (lines 24-64), the prepare_instructions function does accept a guidelines parameter. The implementation is correct.


221-225: Proper exception handling for InstructionPreparationAmbiguity.

The exception handling correctly catches InstructionPreparationAmbiguity, extracts the questions, and exits with code 2 to distinguish from other errors. This aligns with the exception class definition shown in the relevant snippets.


117-126: _parse_provider keys align with registered draive clients. The provider keys 'anthropic', 'openai', and 'gemini' correctly match the imported client classes and PROVIDERS definitions.

mkdocs.yml (1)

101-101: LGTM! Documentation navigation properly registered.

The Postgres guide is correctly added to the navigation section, following the existing pattern and placement within the Guides section. As per coding guidelines, documentation pages must be registered in mkdocs.yml.

pyproject.toml (1)

8-8: LGTM! Appropriate patch version bump.

The version increment from 0.87.0 to 0.87.1 is appropriate for the postgres migration fixes described in this PR. Patch versions typically cover bug fixes and backward-compatible changes.

tests/test_specification.py (1)

259-259: LGTM! Consistent quote style in forward references.

The change from single to double quotes in forward-referenced type hints ("NodeDict | None") is purely stylistic and has no functional impact. This likely aligns with Ruff formatter conventions for consistency.

Also applies to: 292-293

src/draive/postgres/configuration.py (3)

120-121: LGTM! Explicit type casts improve query safety.

The parameter bindings now use explicit PostgreSQL type casts ($1::TEXT, $2::JSONB), which ensures type safety and prevents potential type coercion issues. This pattern is consistent with the changes mentioned in the AI summary for other postgres modules.


44-63: Ignore incorrect DISTINCT ON concern.

The DISTINCT ON (identifier) + ORDER BY identifier, created DESC is intentional and consistent with other modules (e.g. instructions, memory) for selecting the latest configuration per identifier; a composite PK on (identifier, created) doesn’t eliminate the need to group by identifier to fetch only the most recent entry.

Likely an incorrect or invalid review comment.


132-150: Verify delete scope against composite PK
The removing function’s DELETE … WHERE identifier = $1 will remove every version (composite PK (identifier, created)). Confirm if this cleanup of all history is intended or should target only the latest record.

docs/guides/EvaluatorCatalog.md (1)

397-460: LGTM! Excellent alignment with best practices.

The change from list[str] to Sequence[str] properly follows the coding guidelines requiring abstract collection types in type hints. The import is correctly added and usage is consistent throughout the examples.

src/draive/helpers/instruction_preparation.py (3)

85-94: Helper functions are correct and well-structured.

The _format_guidelines and _escape_curly_braces helpers properly encapsulate the formatting logic. The escape function correctly doubles curly braces to prevent template interference.


157-159: Verify variable placeholder escaping in documentation is intentional.

The template now shows {{{{time}}}} and {{{{variable}}}} (four braces) instead of the typical two-brace placeholder syntax. This appears to be documenting the escaped form after _escape_curly_braces is applied, but may confuse users who expect standard {variable} syntax.

Confirm that the quadruple-brace syntax is the intended documentation. If these are meant to be literal examples of the final instruction format (post-escaping), consider adding a clarifying note. If these should be user-facing placeholders, they should remain as {{variable}}.


49-52: I’ve gathered the relevant code sections around the replace call and helper functions; please review the output for confirmation of whether the literal "{guidelines}" can appear in user’s content and warrant a more robust placeholder.

src/draive/postgres/memory.py (7)

14-33: Schema documentation is clear and well-structured.

The new three-table schema with proper foreign key constraints and primary keys is well-documented. The separation of concerns (memories → memories_variables, memories_elements) is a good design.


95-96: Connection handling is correct.

The switch from PostgresConnection.transaction() to Postgres.acquire_connection() with explicit connection.transaction() provides better control over the connection lifecycle.


118-133: Connection usage is correct for elements insertion.

The loop correctly uses connection.execute() for inserting memory elements within the transaction context.


143-162: Connection handling refactored correctly.

The maintenance function now properly acquires a connection and uses explicit transaction context. The new approach of ensuring the memory entry exists via INSERT ... ON CONFLICT DO NOTHING is cleaner than checking existence separately.


192-246: Load context function correctly uses explicit TEXT casting.

The addition of ::TEXT casting on the content column is good for type safety. The function correctly handles both limited and unlimited queries and properly reverses the results to maintain chronological order.


249-263: Context decoding logic is correct.

The _decode_context generator properly handles the JSON decoding and type discrimination for ModelInput and ModelOutput elements.


266-294: Variables loading updated correctly for new schema.

The function now queries memories_variables instead of the old memories table, with proper ::TEXT casting. The DISTINCT ON pattern with ORDER BY ensures the most recent variables are loaded. The fallback to empty Map() when no row exists is appropriate.

src/draive/postgres/instructions.py (3)

49-52: Verify cache limit=1 is intentional for listing function.

The listing cache uses limit=1, meaning only one result set is cached regardless of parameters. Since listing() accepts **extra but doesn't use it to vary the query, this may be intentional. However, if future changes introduce filtering parameters, you'll need to adjust the cache configuration.

Can you confirm that limit=1 is the desired behavior, and that no per-parameter caching is needed?


151-162: Explicit casts and null handling look good.

The explicit ::TEXT and ::JSONB casts align with the schema migration requirements. The declaration.description or None pattern correctly handles empty descriptions by storing NULL.


169-189: LGTM!

The removing function correctly deletes all instruction versions for the given name and properly clears both caches.

docs/guides/Postgres.md (2)

79-108: LGTM!

The ModelMemory documentation is comprehensive and accurately describes the three-table schema with proper foreign key constraints. The CASCADE delete semantics are appropriate for cleaning up related records.


71-75: No changes needed: documentation method names accurately reflect the InstructionsRepository API.

```

Key capabilities:
- `configurations()` returns every known identifier using cached results (limit 1, default 10 minute TTL).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify method name "configurations()".

The documentation refers to configurations() as returning "every known identifier". Confirm this matches the actual method name in PostgresConfigurationRepository. Based on typical repository patterns, it might be named something like list(), identifiers(), or available_configurations().


🏁 Script executed:

#!/bin/bash
# Search for the actual method name in PostgresConfigurationRepository
rg -n "def (configurations|list|identifiers|available)" src/draive/postgres/configuration.py

Length of output: 25


Use the correct method name listing() in documentation
Replace configurations() with listing() in docs/guides/Postgres.md to match the actual method in PostgresConfigurationRepository.

🤖 Prompt for AI Agents
In docs/guides/Postgres.md around line 46, the docs incorrectly reference
configurations() — update that method name to listing() to match
PostgresConfigurationRepository; ensure the sentence reads that listing()
returns every known identifier using cached results (limit 1, default 10 minute
TTL) and run a quick search in the file for any other occurrences of
configurations() to replace with listing().

Comment on lines +15 to +16
# created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
# PRIMARY KEY (identifier, created)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify migration path for composite primary key change.

The schema now uses a composite primary key (identifier, created) instead of a single-column primary key. This is a breaking change that requires data migration. Ensure:

  1. A migration script exists to alter existing tables
  2. Existing data has the created column populated (the DEFAULT CURRENT_TIMESTAMP only applies to new rows)
  3. Applications expecting a single-column PK are updated

Run the following script to check for migration scripts and schema documentation:


🏁 Script executed:

#!/bin/bash
# Description: Search for postgres migration scripts and schema setup documentation

# Look for migration scripts
fd -e sql -e py 'migrat' --exec echo "Found migration file: {}"

# Search for schema setup or migration documentation
rg -n "CREATE TABLE configurations|ALTER TABLE configurations" --type=sql --type=py

# Check if there are other references to the old schema
rg -n "PRIMARY KEY.*identifier[^,]" --type=sql --type=py

Length of output: 135


Provide a migration for the new composite primary key.

No migration script exists to add the created TIMESTAMPTZ column, backfill its values, or replace the old single-column PK with (identifier, created). Create and commit a database migration (SQL or migration‐tool script) to:

  • ALTER TABLE configurations ADD COLUMN created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP
  • UPDATE existing rows to set created where NULL
  • DROP the old primary key on identifier and ADD PRIMARY KEY (identifier, created)
🤖 Prompt for AI Agents
In src/draive/postgres/configuration.py around lines 15 to 16, there is no
migration to add the new created TIMESTAMPTZ column and replace the
single-column primary key; create and commit a migration that: ALTER TABLE
configurations ADD COLUMN created TIMESTAMPTZ NOT NULL DEFAULT
CURRENT_TIMESTAMP; UPDATE existing rows to set created for preexisting records
(e.g. set to CURRENT_TIMESTAMP or an appropriate historical value) to avoid
NULLs; DROP the existing PRIMARY KEY on identifier and ADD a composite PRIMARY
KEY (identifier, created); ensure the migration is idempotent, runs inside a
transaction, and includes a rollback or safe ordering (add column and backfill
before dropping the old PK) and add tests or a migration checksum if your
migration tool requires it.

Comment on lines +24 to +25
# created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
# PRIMARY KEY (name, created)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove trailing comma for SQL style consistency.

The trailing comma after TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP is valid but inconsistent with line 23, which has no comma. SQL style guides typically omit trailing commas in column definitions.

-#     created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
+#     created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP
 #     PRIMARY KEY (name, created)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
# PRIMARY KEY (name, created)
# created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP
# PRIMARY KEY (name, created)
🤖 Prompt for AI Agents
In src/draive/postgres/instructions.py around lines 24 to 25, remove the
trailing comma after the column definition "created TIMESTAMPTZ NOT NULL DEFAULT
CURRENT_TIMESTAMP," so it matches the style used on the previous line; update
the SQL block to omit that trailing comma before the PRIMARY KEY line to
maintain consistency in column definition formatting.

description,
declaration.description or None,
content,
f"[{','.join(argument.to_json() for argument in declaration.arguments)}]",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use proper JSON serialization instead of manual string construction.

Manually constructing a JSON array string with f"[{','.join(...)}]" is fragile and potentially unsafe. If argument.to_json() doesn't properly escape special characters, this could lead to malformed JSON or injection issues.

Use json.dumps() to serialize the arguments array safely:

+import json
+
 ...
             declaration.name,
             declaration.description or None,
             content,
-            f"[{','.join(argument.to_json() for argument in declaration.arguments)}]",
+            json.dumps([json.loads(argument.to_json()) for argument in declaration.arguments]),
             declaration.meta.to_json(),

Or if declaration.arguments can be serialized directly:

+import json
+
 ...
-            f"[{','.join(argument.to_json() for argument in declaration.arguments)}]",
+            json.dumps([arg for arg in declaration.arguments], default=lambda o: o.__dict__ if hasattr(o, '__dict__') else str(o)),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
f"[{','.join(argument.to_json() for argument in declaration.arguments)}]",
# At the top of src/draive/postgres/instructions.py, ensure you have:
import json
# …later, where the arguments array is constructed:
declaration.name,
declaration.description or None,
content,
json.dumps([json.loads(argument.to_json()) for argument in declaration.arguments]),
declaration.meta.to_json(),
🤖 Prompt for AI Agents
In src/draive/postgres/instructions.py around line 161, the code builds a JSON
array string via f"[{','.join(argument.to_json() for argument in
declaration.arguments)}]" which is fragile and unsafe; replace this manual
construction with proper JSON serialization by creating a Python list of the
argument JSON values (or serializable objects) and calling json.dumps(...) on
that list (import json at top if missing) so special characters are escaped and
the output is valid JSON.

@KaQuMiQ KaQuMiQ force-pushed the feature/migrations branch from 26b16b5 to 4bc9747 Compare October 1, 2025 12:05
@KaQuMiQ KaQuMiQ force-pushed the feature/migrations branch from 4bc9747 to 24ab898 Compare October 1, 2025 12:07
@KaQuMiQ KaQuMiQ merged commit 023bdd0 into main Oct 1, 2025
3 of 5 checks passed
@KaQuMiQ KaQuMiQ deleted the feature/migrations branch October 1, 2025 12:11
@coderabbitai coderabbitai bot mentioned this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant