Skip to content

Remove pg0-embedded dependency, fix migration transaction bug#3

Merged
franchb merged 1 commit intomainfrom
pg-perm
Feb 23, 2026
Merged

Remove pg0-embedded dependency, fix migration transaction bug#3
franchb merged 1 commit intomainfrom
pg-perm

Conversation

@franchb
Copy link
Owner

@franchb franchb commented Feb 23, 2026

Summary

  • Remove pg0-embedded as a hard dependency — this fork targets external PostgreSQL with vchord_bm25, so the embedded pg0 (which doesn't bundle vchord_bm25 and has bind mount permission issues in Docker) is no longer needed. The pg0 import is now conditional with a clear RuntimeError guiding users to set HINDSIGHT_API_DATABASE_URL.
  • Fix InFailedSqlTransaction bug in migrations — after a failed CREATE EXTENSION, PostgreSQL aborts the transaction, but the except handler tried to query in the same aborted transaction. Now uses SAVEPOINT/ROLLBACK TO SAVEPOINT to keep the transaction alive for the fallback check.
  • Remove .pg0 data directory creation from Dockerfile — no longer needed since embedded PostgreSQL isn't used in container builds.

Test plan

  • uv sync succeeds without pg0-embedded as a direct dependency
  • parse_pg0_url() works without pg0 installed
  • EmbeddedPostgres._get_pg0() raises clear RuntimeError when pg0 is missing
  • ./scripts/hooks/lint.sh passes
  • Docker build: docker build --target standalone --build-arg INCLUDE_LOCAL_MODELS=false --build-arg PRELOAD_ML_MODELS=false -f docker/standalone/Dockerfile .

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced database extension initialization with improved error handling and transaction safety checks for text search and vector extensions.
  • Chores

    • PostgreSQL embedded database is now optional; install pg0-embedded separately or configure an external PostgreSQL instance.
    • Simplified Docker container configurations.

Remove embedded PostgreSQL (pg0) as a hard dependency since this fork
targets external PostgreSQL with vchord_bm25. The pg0 import is now
conditional with a clear RuntimeError guiding users to set
HINDSIGHT_API_DATABASE_URL.

Fix CREATE EXTENSION error handling in migrations: after a failed
CREATE EXTENSION, PostgreSQL aborts the transaction, causing subsequent
queries in the except handler to fail with InFailedSqlTransaction.
Use SAVEPOINTs to isolate the extension creation so the transaction
remains usable in the fallback path.

Also remove the .pg0 data directory creation from the Dockerfile since
embedded PostgreSQL is no longer used in container builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

This PR makes the pg0-embedded dependency optional by removing it from project dependencies, updating Docker configuration to not create pg0-specific directories, and adding graceful fallback handling for the pg0 module import with runtime error messaging.

Changes

Cohort / File(s) Summary
Docker Configuration
docker/standalone/Dockerfile
Removed RUN mkdir -p /home/hindsight/.pg0 directory creation blocks from API final image stage and standalone final image stage.
Alembic Migrations
hindsight_api/alembic/versions/5a366d414dce_initial_schema.py, hindsight_api/alembic/versions/n9i0j1k2l3m4_learnings_and_pinned_reflections.py
Added explicit SAVEPOINT/RELEASE/ROLLBACK transaction management around text-search extension creation (vchord_bm25, pg_textsearch) with existence verification queries to determine if extensions truly fail.
Optional pg0 Import
hindsight_api/pg0.py
Made pg0 module import optional with fallback to None; added runtime guard in EmbeddedPostgres._get_pg0 to raise RuntimeError with installation guidance if pg0 is unavailable.
Dependency Management
pyproject.toml
Removed pg0-embedded>=0.11.0 from project dependencies.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A postgres path once bundled tight,
Now optional, a lighter flight!
We save and roll with graceful care,
Extensions checked with truthful flair—
Dependencies pruned, the code runs free! 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: removing the pg0-embedded dependency and fixing a migration transaction bug with SAVEPOINT/ROLLBACK handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pg-perm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
hindsight-api/hindsight_api/alembic/versions/n9i0j1k2l3m4_learnings_and_pinned_reflections.py (1)

105-114: Same correct pattern for pg_textsearch.

Consistent with the vchord_bm25 branch and the initial_schema migration. If you extract the helper suggested in the initial_schema review, this code can also be simplified.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hindsight-api/hindsight_api/alembic/versions/n9i0j1k2l3m4_learnings_and_pinned_reflections.py`
around lines 105 - 114, The SAVEPOINT/release/rollback pattern around creating
the pg_textsearch extension should match the vchord_bm25/initial_schema
migrations: wrap op.execute("CREATE EXTENSION IF NOT EXISTS pg_textsearch
CASCADE") with conn.execute(text("SAVEPOINT create_ext_pg_textsearch")) and
conn.execute(text("RELEASE SAVEPOINT create_ext_pg_textsearch")) in try, and on
Exception ROLLBACK TO SAVEPOINT then query pg_extension (via
conn.execute(text("SELECT 1 FROM pg_extension WHERE extname =
'pg_textsearch'")).fetchone()) and re-raise only if the extension still isn't
present; optionally refactor this logic into the shared helper used by
initial_schema to reduce duplication (referencing conn, op, and the savepoint
name create_ext_pg_textsearch).
🧹 Nitpick comments (4)
hindsight-api/hindsight_api/alembic/versions/n9i0j1k2l3m4_learnings_and_pinned_reflections.py (1)

82-121: _detect_text_search_extension() is duplicated verbatim across migration files.

Both this file and 5a366d414dce_initial_schema.py contain identical copies of _detect_text_search_extension() and _detect_vector_extension(). While Alembic migrations are typically self-contained, these helpers are pure detection logic with no schema-version-specific behavior — they'd be good candidates for a shared module (e.g., a migration_utils.py alongside the versions directory). This would also make the savepoint helper extraction from the other review comment more impactful.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hindsight-api/hindsight_api/alembic/versions/n9i0j1k2l3m4_learnings_and_pinned_reflections.py`
around lines 82 - 121, The duplicated pure helpers _detect_text_search_extension
and _detect_vector_extension should be extracted into a shared migration utility
module (e.g., migration_utils.py placed alongside the Alembic versions) and both
migration scripts should import and call those functions instead of containing
verbatim copies; create migration_utils.py with the detection logic (including
the savepoint/create-extension sequences), update the migration files to remove
the local definitions and import _detect_text_search_extension and
_detect_vector_extension, and ensure the migrations still call the imported
functions where they previously referenced the local ones.
hindsight-api/hindsight_api/alembic/versions/5a366d414dce_initial_schema.py (2)

126-135: Savepoint for vector extension fallback looks correct.

Same pattern as the text-search extensions — properly guards the fallback CREATE EXTENSION IF NOT EXISTS vector so the transaction survives a permission-denied error when the extension is already installed.

One nit: the bare except Exception silently swallows the original error when the extension does already exist. Consider logging the caught exception at DEBUG level so operators can diagnose unexpected failures during migrations, rather than only discovering the swallow via savepoint rollback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hindsight-api/hindsight_api/alembic/versions/5a366d414dce_initial_schema.py`
around lines 126 - 135, In the except Exception block guarding the vector
extension creation, capture the exception (e.g., except Exception as e) and log
it at DEBUG before (or immediately after) performing the SAVEPOINT rollback so
operators can diagnose failures; use a module logger
(logging.getLogger(__name__) or an existing logger) and include exception
details (exc_info=True or similar) in the log call, referencing the existing
symbols conn, op.execute("CREATE EXTENSION IF NOT EXISTS vector"), and the
savepoint/rollback logic to locate where to add the debug logging.

85-94: Consider extracting the repeated savepoint-guarded extension creation into a helper.

The same SAVEPOINT → CREATE EXTENSION → RELEASE / ROLLBACK + existence-check pattern appears three times in this file and is duplicated again in n9i0j1k2l3m4_learnings_and_pinned_reflections.py. A small helper would reduce duplication and make the intent clearer.

♻️ Example helper extraction
def _create_extension_safe(ext_name: str, savepoint_name: str | None = None) -> None:
    """Create a PostgreSQL extension within a savepoint, tolerating permission errors if already installed."""
    sp = savepoint_name or f"create_ext_{ext_name}"
    conn = op.get_bind()
    try:
        conn.execute(text(f"SAVEPOINT {sp}"))
        op.execute(f"CREATE EXTENSION IF NOT EXISTS {ext_name} CASCADE")
        conn.execute(text(f"RELEASE SAVEPOINT {sp}"))
    except Exception:
        conn.execute(text(f"ROLLBACK TO SAVEPOINT {sp}"))
        result = conn.execute(
            text("SELECT 1 FROM pg_extension WHERE extname = :name"),
            {"name": ext_name},
        ).fetchone()
        if not result:
            raise

This can then be shared across migration files (e.g., placed in a common utils module alongside _get_schema_prefix).

Also applies to: 98-107, 126-135

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hindsight-api/hindsight_api/alembic/versions/5a366d414dce_initial_schema.py`
around lines 85 - 94, Extract the repeated SAVEPOINT→CREATE
EXTENSION→RELEASE/ROLLBACK pattern into a reusable helper (e.g.,
_create_extension_safe) that encapsulates op.get_bind(), the
SAVEPOINT/RELEASE/ROLLBACK logic, the CREATE EXTENSION IF NOT EXISTS call, and
the post-error pg_extension existence check; replace each duplicated block in
the migration modules (the initial_schema migration and the
n9i0j1k2l3m4_learnings_and_pinned_reflections migration) with calls to this
helper, and place the helper in a shared utils module (or a small local helper
at top of the migration files) so the migrations call
_create_extension_safe(ext_name) instead of repeating the block.
hindsight-api/hindsight_api/pg0.py (1)

36-40: Clear guard with actionable error message.

The guard correctly prevents any Pg0(...) call when the module is absent. The error message directs users to the HINDSIGHT_API_DATABASE_URL alternative, which aligns with the PR objective.

One minor observation: when Pg0 is None, the type annotation on Line 33 (self._pg0: Pg0 | None) degrades to NoneType | None, and the return annotation on Line 35 (-> Pg0) becomes -> None. This is semantically consistent (the function raises before returning), so it won't cause runtime issues — but static type checkers like mypy may flag it. Consider guarding the annotations with TYPE_CHECKING:

💡 Optional: type-checking-safe annotation
+from __future__ import annotations
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pg0 import Pg0 as Pg0Type
+else:
+    Pg0Type = None
+
 try:
     from pg0 import Pg0
 except ImportError:
     Pg0 = None

Then use Pg0Type in annotations and Pg0 for runtime checks. This is only relevant if you run mypy/pyright on this codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hindsight-api/hindsight_api/pg0.py` around lines 36 - 40, Wrap runtime
import/usage of Pg0 with a TYPE_CHECKING-safe annotation alias so static type
checkers don't see Pg0 as None: import TYPE_CHECKING from typing, and inside "if
TYPE_CHECKING:" import Pg0 (e.g. "from pg0_embedded import Pg0 as Pg0Type") or
declare a protocol/alias, then use Pg0Type in annotations (replace self._pg0:
Pg0 | None and the method return annotation -> Pg0 with Pg0Type | None and ->
Pg0Type). Keep the runtime guard that checks "if Pg0 is None: raise
RuntimeError(...)" and continue to use the runtime Pg0 identifier for
construction/usage while annotations reference Pg0Type.
ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee058e4 and b59f897.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • docker/standalone/Dockerfile
  • hindsight-api/hindsight_api/alembic/versions/5a366d414dce_initial_schema.py
  • hindsight-api/hindsight_api/alembic/versions/n9i0j1k2l3m4_learnings_and_pinned_reflections.py
  • hindsight-api/hindsight_api/pg0.py
  • hindsight-api/pyproject.toml
💤 Files with no reviewable changes (2)
  • docker/standalone/Dockerfile
  • hindsight-api/pyproject.toml
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@hindsight-api/hindsight_api/alembic/versions/n9i0j1k2l3m4_learnings_and_pinned_reflections.py`:
- Around line 105-114: The SAVEPOINT/release/rollback pattern around creating
the pg_textsearch extension should match the vchord_bm25/initial_schema
migrations: wrap op.execute("CREATE EXTENSION IF NOT EXISTS pg_textsearch
CASCADE") with conn.execute(text("SAVEPOINT create_ext_pg_textsearch")) and
conn.execute(text("RELEASE SAVEPOINT create_ext_pg_textsearch")) in try, and on
Exception ROLLBACK TO SAVEPOINT then query pg_extension (via
conn.execute(text("SELECT 1 FROM pg_extension WHERE extname =
'pg_textsearch'")).fetchone()) and re-raise only if the extension still isn't
present; optionally refactor this logic into the shared helper used by
initial_schema to reduce duplication (referencing conn, op, and the savepoint
name create_ext_pg_textsearch).

---

Nitpick comments:
In `@hindsight-api/hindsight_api/alembic/versions/5a366d414dce_initial_schema.py`:
- Around line 126-135: In the except Exception block guarding the vector
extension creation, capture the exception (e.g., except Exception as e) and log
it at DEBUG before (or immediately after) performing the SAVEPOINT rollback so
operators can diagnose failures; use a module logger
(logging.getLogger(__name__) or an existing logger) and include exception
details (exc_info=True or similar) in the log call, referencing the existing
symbols conn, op.execute("CREATE EXTENSION IF NOT EXISTS vector"), and the
savepoint/rollback logic to locate where to add the debug logging.
- Around line 85-94: Extract the repeated SAVEPOINT→CREATE
EXTENSION→RELEASE/ROLLBACK pattern into a reusable helper (e.g.,
_create_extension_safe) that encapsulates op.get_bind(), the
SAVEPOINT/RELEASE/ROLLBACK logic, the CREATE EXTENSION IF NOT EXISTS call, and
the post-error pg_extension existence check; replace each duplicated block in
the migration modules (the initial_schema migration and the
n9i0j1k2l3m4_learnings_and_pinned_reflections migration) with calls to this
helper, and place the helper in a shared utils module (or a small local helper
at top of the migration files) so the migrations call
_create_extension_safe(ext_name) instead of repeating the block.

In
`@hindsight-api/hindsight_api/alembic/versions/n9i0j1k2l3m4_learnings_and_pinned_reflections.py`:
- Around line 82-121: The duplicated pure helpers _detect_text_search_extension
and _detect_vector_extension should be extracted into a shared migration utility
module (e.g., migration_utils.py placed alongside the Alembic versions) and both
migration scripts should import and call those functions instead of containing
verbatim copies; create migration_utils.py with the detection logic (including
the savepoint/create-extension sequences), update the migration files to remove
the local definitions and import _detect_text_search_extension and
_detect_vector_extension, and ensure the migrations still call the imported
functions where they previously referenced the local ones.

In `@hindsight-api/hindsight_api/pg0.py`:
- Around line 36-40: Wrap runtime import/usage of Pg0 with a TYPE_CHECKING-safe
annotation alias so static type checkers don't see Pg0 as None: import
TYPE_CHECKING from typing, and inside "if TYPE_CHECKING:" import Pg0 (e.g. "from
pg0_embedded import Pg0 as Pg0Type") or declare a protocol/alias, then use
Pg0Type in annotations (replace self._pg0: Pg0 | None and the method return
annotation -> Pg0 with Pg0Type | None and -> Pg0Type). Keep the runtime guard
that checks "if Pg0 is None: raise RuntimeError(...)" and continue to use the
runtime Pg0 identifier for construction/usage while annotations reference
Pg0Type.

@franchb franchb merged commit baa18d6 into main Feb 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant