Skip to content

feat: multi-instance VectorChord docker-compose with shared PostgreSQL#6

Open
franchb wants to merge 1 commit intomainfrom
fix/deepinfra-qwen-embedding
Open

feat: multi-instance VectorChord docker-compose with shared PostgreSQL#6
franchb wants to merge 1 commit intomainfrom
fix/deepinfra-qwen-embedding

Conversation

@franchb
Copy link
Owner

@franchb franchb commented Feb 24, 2026

Summary

  • New init-vchord.sh: Replaces init-vchord.sql with a shell script that installs VectorChord extensions (vchord, pg_tokenizer, vchord_bm25) and creates the llmlingua2 tokenizer in both the default database and any additional databases defined via EXTRA_DATABASES env var (dbname:user:password,...)
  • New docker-compose.multi.yaml: Two Hindsight API instances sharing one VectorChord PostgreSQL container, each with its own isolated database/user/password, using DeepInfra embeddings and reranker via litellm-sdk
  • Updated docker-compose.yaml: Switched to fork image (ghcr.io/franchb/hindsight-api:latest-slim), removed vectorchord-init sidecar, added DB healthcheck with service_healthy dependency, removed port 9999 mapping
  • Updated .env.example: Added litellm-sdk provider examples for DeepInfra embeddings and reranker
  • Updated CLAUDE.md: Corrected Qwen3 model names and openai/ routing prefix for embeddings
  • Updated README.md: Added "Docker with VectorChord" section to Quick Start

Test plan

  • docker compose -f docker/docker-compose/vchord/docker-compose.yaml config validates without errors
  • docker compose -f docker/docker-compose/vchord/docker-compose.multi.yaml config validates without errors
  • Single-instance: docker compose -f docker/docker-compose/vchord/docker-compose.yaml up -d starts DB + Hindsight, extensions installed via init script
  • Multi-instance: docker compose -f docker/docker-compose/vchord/docker-compose.multi.yaml up -d starts DB + 2 agents on ports 8881/8882
  • Verify extensions in each database: docker exec hindsight-multi-db psql -U hindsight_admin -d hindsight_agent1 -c "SELECT extname FROM pg_extension;"
  • Both agents respond to health checks: curl localhost:8881/health and curl localhost:8882/health

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added VectorChord PostgreSQL extension support as an alternative vector search option.
    • Introduced multi-instance deployment configuration supporting multiple API endpoints from a single database.
    • Expanded embedding and reranker provider options including litellm-sdk and tei integration.
  • Documentation

    • Updated README with comprehensive Docker deployment instructions for VectorChord setups.
    • Expanded environment configuration examples for new provider options.

Replace init-vchord.sql with init-vchord.sh shell script that handles
extension installation (vchord, pg_tokenizer, vchord_bm25) and
llmlingua2 tokenizer creation for both the default database and
additional databases via EXTRA_DATABASES env var.

- Add docker-compose.multi.yaml: two Hindsight API instances sharing
  one VectorChord PostgreSQL, each with isolated database/user/password,
  using DeepInfra embeddings/reranker via litellm-sdk
- Update docker-compose.yaml: switch to fork image
  (ghcr.io/franchb/hindsight-api), remove vectorchord-init sidecar,
  add healthcheck with service_healthy dependency
- Update .env.example with litellm-sdk provider examples for DeepInfra
- Update CLAUDE.md with correct Qwen3 model names and routing prefixes
- Add VectorChord deployment section to README.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

The pull request extends VectorChord support by refactoring Docker Compose configurations, introducing a multi-instance deployment setup, adding a PostgreSQL initialization script for VectorChord extensions, and updating environment documentation to support multiple embedding and reranker providers including litellm-sdk integration.

Changes

Cohort / File(s) Summary
Configuration Updates
.env.example, CLAUDE.md
Expanded environment variable documentation for embedding and reranker providers (local, tei, litellm-sdk); updated model configuration for DeepInfra with OpenAI-prefixed paths and API base URL.
Documentation
README.md
Added "Docker with VectorChord" section documenting single and multi-instance deployment with environment variables and docker-compose file references.
Docker Compose Core
docker/docker-compose/vchord/docker-compose.yaml
Removed vectorchord-init service; fixed PostgreSQL image tag; added persistent volume and init script mounting; introduced PostgreSQL healthcheck; updated app image registry and tag; reduced exposed ports; refactored environment and dependency configuration.
Docker Compose Multi-Instance
docker/docker-compose/vchord/docker-compose.multi.yaml
Added new multi-instance setup with two hindsight API services sharing a single PostgreSQL database, each with isolated databases and separate API endpoints on ports 8881 and 8882.
Initialization Script
docker/docker-compose/vchord/init-vchord.sh
New shell script to install VectorChord extensions (vchord, pg\_tokenizer, vchord\_bm25) and configure tokenizers; processes EXTRA\_DATABASES for per-instance database creation with user provisioning and privilege grants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 VectorChord hops into the fold today,
Multi-instances dance and play,
Extensions bloom with postgres grace,
Docker compose sets up the place,
Two agents tango, sharing space! 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the primary feature being introduced: a multi-instance VectorChord docker-compose configuration with shared PostgreSQL database.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/deepinfra-qwen-embedding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
docker/docker-compose/vchord/docker-compose.multi.yaml (1)

43-71: Consider using YAML anchors to reduce duplication between agent services.

The two agent service definitions are nearly identical, differing only in container name, port, and database URL. YAML anchors (& / * / <<:) could reduce the duplication, though readability is a valid trade-off for a demo compose file.

Also applies to: 73-101

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/docker-compose/vchord/docker-compose.multi.yaml` around lines 43 - 71,
Create a YAML anchor for the shared service configuration (e.g., an anchor named
hindsight_agent_base) that contains the common keys used by hindsight-agent1
(image, environment entries except the instance-specific ones, networks,
depends_on) and then update the hindsight-agent1 service to merge in that anchor
using <<: *hindsight_agent_base while overriding the unique fields
(container_name, ports, and HINDSIGHT_API_DATABASE_URL). Repeat the same pattern
for the second agent service(s) (the block that differs only by
port/container/database) so they each inherit the anchor and only specify their
instance-specific overrides.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/docker-compose/vchord/docker-compose.multi.yaml`:
- Around line 27-31: The docker-compose snippet hardcodes passwords in
POSTGRES_PASSWORD and in the EXTRA_DATABASES string (and corresponding
DATABASE_URL entries), so change these to use environment variable substitution
(e.g., replace hindsight_admin_pass, agent1_pass, agent2_pass with
${POSTGRES_PASSWORD_ADMIN:-hindsight_admin_pass},
${AGENT1_DB_PASS:-agent1_pass}, ${AGENT2_DB_PASS:-agent2_pass} respectively) and
update the EXTRA_DATABASES value and the DATABASE_URL entries (the DATABASE_URL
occurrences referenced in the review) to reference the same env vars so
credentials stay in sync and can be overridden at runtime; ensure you only
change the credential tokens (POSTGRES_PASSWORD, EXTRA_DATABASES, DATABASE_URL
values) and keep the current defaults as fallbacks using the ${VAR:-default}
form.
- Around line 25-26: The compose service exposes PostgreSQL with the ports
mapping "5436:5432" which conflicts with the same host port used in the
single-instance compose, so change the ports entry under the service (the ports:
mapping currently set to "5436:5432") to a non-conflicting host port such as
"5437:5432" or add a comment/docs note that the multi and single compose files
are mutually exclusive to avoid concurrent binding; update only the ports value
in docker-compose.multi.yaml (ports: -> "5437:5432") or add documentation
accordingly.

In `@docker/docker-compose/vchord/docker-compose.yaml`:
- Around line 7-14: The docs state HINDSIGHT_DB_PASSWORD is required but the
compose file supplies a fallback default (hindsight_password); make them
consistent by removing the fallback default from the docker-compose service
environment (remove the hindsight_password default variable) so the container
fails when HINDSIGHT_DB_PASSWORD is unset, or alternatively remove
HINDSIGHT_DB_PASSWORD from the "Required" header and list it as optional with
its default; locate the environment block that sets HINDSIGHT_DB_PASSWORD /
hindsight_password to change the default behavior accordingly.

In `@docker/docker-compose/vchord/init-vchord.sh`:
- Around line 42-51: The SQL is vulnerable to breaking when db_user, db_pass, or
db_name contain quotes/semicolons; instead pass them into psql via -v and use
psql substitution forms so identifiers are double-quoted and passwords are safe:
call psql -v db_user="$db_user" -v db_pass="$db_pass" -v db_name="$db_name" and
inside the heredoc use :"db_user" and :"db_name" for identifiers and :'db_pass'
(or dollar-quoting via $$...$$) for the password literal in the DO $$ block and
CREATE ROLE statement; apply the same change to the later GRANT statements so
all occurrences of ${db_user}, ${db_pass}, ${db_name} are replaced with the
appropriate psql substitutions.

---

Nitpick comments:
In `@docker/docker-compose/vchord/docker-compose.multi.yaml`:
- Around line 43-71: Create a YAML anchor for the shared service configuration
(e.g., an anchor named hindsight_agent_base) that contains the common keys used
by hindsight-agent1 (image, environment entries except the instance-specific
ones, networks, depends_on) and then update the hindsight-agent1 service to
merge in that anchor using <<: *hindsight_agent_base while overriding the unique
fields (container_name, ports, and HINDSIGHT_API_DATABASE_URL). Repeat the same
pattern for the second agent service(s) (the block that differs only by
port/container/database) so they each inherit the anchor and only specify their
instance-specific overrides.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce79731 and 93e129a.

📒 Files selected for processing (6)
  • .env.example
  • CLAUDE.md
  • README.md
  • docker/docker-compose/vchord/docker-compose.multi.yaml
  • docker/docker-compose/vchord/docker-compose.yaml
  • docker/docker-compose/vchord/init-vchord.sh

Comment on lines +25 to +26
ports:
- "5436:5432"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential host-port conflict with single-instance compose.

Both docker-compose.yaml and docker-compose.multi.yaml bind host port 5436 for PostgreSQL. Running both simultaneously will cause a bind error. Consider using a different host port (e.g., 5437) or documenting that they are mutually exclusive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/docker-compose/vchord/docker-compose.multi.yaml` around lines 25 - 26,
The compose service exposes PostgreSQL with the ports mapping "5436:5432" which
conflicts with the same host port used in the single-instance compose, so change
the ports entry under the service (the ports: mapping currently set to
"5436:5432") to a non-conflicting host port such as "5437:5432" or add a
comment/docs note that the multi and single compose files are mutually exclusive
to avoid concurrent binding; update only the ports value in
docker-compose.multi.yaml (ports: -> "5437:5432") or add documentation
accordingly.

Comment on lines +27 to +31
environment:
POSTGRES_USER: hindsight_admin
POSTGRES_PASSWORD: hindsight_admin_pass
POSTGRES_DB: hindsight_default
EXTRA_DATABASES: hindsight_agent1:agent1_user:agent1_pass,hindsight_agent2:agent2_user:agent2_pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded credentials — consider using environment variable substitution.

Unlike docker-compose.yaml which uses ${VAR:-default} substitution for credentials, this file hardcodes all passwords (hindsight_admin_pass, agent1_pass, agent2_pass) directly. This creates two problems:

  1. Credentials must be kept in sync between EXTRA_DATABASES (line 31) and DATABASE_URL (lines 53, 83) — a mismatch silently breaks connectivity.
  2. Users who deploy this without modification are running with well-known passwords.

Consider parameterizing at least the passwords via env vars (e.g., ${AGENT1_DB_PASS:-agent1_pass}) to make it easy to override while keeping the current values as convenient defaults.

Sketch of parameterized credentials
     environment:
       POSTGRES_USER: hindsight_admin
-      POSTGRES_PASSWORD: hindsight_admin_pass
+      POSTGRES_PASSWORD: ${HINDSIGHT_ADMIN_DB_PASSWORD:-hindsight_admin_pass}
       POSTGRES_DB: hindsight_default
-      EXTRA_DATABASES: hindsight_agent1:agent1_user:agent1_pass,hindsight_agent2:agent2_user:agent2_pass
+      EXTRA_DATABASES: "hindsight_agent1:agent1_user:${AGENT1_DB_PASS:-agent1_pass},hindsight_agent2:agent2_user:${AGENT2_DB_PASS:-agent2_pass}"

And correspondingly for the DATABASE_URL lines:

-      HINDSIGHT_API_DATABASE_URL: postgresql://agent1_user:agent1_pass@db:5432/hindsight_agent1
+      HINDSIGHT_API_DATABASE_URL: postgresql://agent1_user:${AGENT1_DB_PASS:-agent1_pass}@db:5432/hindsight_agent1
-      HINDSIGHT_API_DATABASE_URL: postgresql://agent2_user:agent2_pass@db:5432/hindsight_agent2
+      HINDSIGHT_API_DATABASE_URL: postgresql://agent2_user:${AGENT2_DB_PASS:-agent2_pass}@db:5432/hindsight_agent2

Also applies to: 53-53, 83-83

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/docker-compose/vchord/docker-compose.multi.yaml` around lines 27 - 31,
The docker-compose snippet hardcodes passwords in POSTGRES_PASSWORD and in the
EXTRA_DATABASES string (and corresponding DATABASE_URL entries), so change these
to use environment variable substitution (e.g., replace hindsight_admin_pass,
agent1_pass, agent2_pass with ${POSTGRES_PASSWORD_ADMIN:-hindsight_admin_pass},
${AGENT1_DB_PASS:-agent1_pass}, ${AGENT2_DB_PASS:-agent2_pass} respectively) and
update the EXTRA_DATABASES value and the DATABASE_URL entries (the DATABASE_URL
occurrences referenced in the review) to reference the same env vars so
credentials stay in sync and can be overridden at runtime; ensure you only
change the credential tokens (POSTGRES_PASSWORD, EXTRA_DATABASES, DATABASE_URL
values) and keep the current defaults as fallbacks using the ${VAR:-default}
form.

Comment on lines +7 to +14
# Required environment variables (set in .env or shell):
# HINDSIGHT_DB_PASSWORD - Password for the PostgreSQL user
# Configure LLM provider variables as needed (see hindsight service below)
#
# Optional environment variables with defaults:
# - HINDSIGHT_VERSION: Hindsight application version (default: latest)
# - HINDSIGHT_DB_USER: PostgreSQL user (default: hindsight_user)
# - HINDSIGHT_DB_NAME: PostgreSQL database name (default: hindsight_db)
# - HINDSIGHT_DB_VERSION: PostgreSQL version (default: 18)
# HINDSIGHT_VERSION - Hindsight image tag (default: latest-slim)
# HINDSIGHT_DB_USER - PostgreSQL user (default: hindsight_user)
# HINDSIGHT_DB_NAME - PostgreSQL database name (default: hindsight_db)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation says HINDSIGHT_DB_PASSWORD is required, but line 25 provides a default.

The header comments list HINDSIGHT_DB_PASSWORD as a required environment variable, but the compose file provides hindsight_password as a default fallback (line 25). Either remove it from the "Required" section or drop the default so the container fails explicitly when the password isn't set.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/docker-compose/vchord/docker-compose.yaml` around lines 7 - 14, The
docs state HINDSIGHT_DB_PASSWORD is required but the compose file supplies a
fallback default (hindsight_password); make them consistent by removing the
fallback default from the docker-compose service environment (remove the
hindsight_password default variable) so the container fails when
HINDSIGHT_DB_PASSWORD is unset, or alternatively remove HINDSIGHT_DB_PASSWORD
from the "Required" header and list it as optional with its default; locate the
environment block that sets HINDSIGHT_DB_PASSWORD / hindsight_password to change
the default behavior accordingly.

Comment on lines +42 to +51
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
DO \$\$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN
CREATE ROLE ${db_user} WITH LOGIN PASSWORD '${db_pass}';
END IF;
END
\$\$;
CREATE DATABASE ${db_name} OWNER ${db_user};
EOSQL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SQL injection / special-character breakage in interpolated SQL.

db_user, db_pass, and db_name are interpolated directly into SQL without quoting or escaping. While these values come from the docker-compose environment (not end-user input), a password or name containing a single quote (') or semicolon will break or corrupt the SQL statements. Identifiers should be double-quoted, and passwords should use escaped literals or be set separately.

Proposed safer approach (wrap identifiers, use separate password setting)
         psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
             DO \$\$
             BEGIN
-                IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN
-                    CREATE ROLE ${db_user} WITH LOGIN PASSWORD '${db_pass}';
+                IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN
+                    CREATE ROLE "${db_user}" WITH LOGIN;
                 END IF;
             END
             \$\$;
-            CREATE DATABASE ${db_name} OWNER ${db_user};
+            CREATE DATABASE "${db_name}" OWNER "${db_user}";
 EOSQL
+
+        # Set password separately to avoid SQL quoting issues with special characters
+        psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" \
+            -c "ALTER ROLE \"${db_user}\" WITH PASSWORD $(printf '%q' "'${db_pass}'")"

Note: The same unquoted interpolation pattern applies to the GRANT statements on lines 57–62.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
DO \$\$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN
CREATE ROLE ${db_user} WITH LOGIN PASSWORD '${db_pass}';
END IF;
END
\$\$;
CREATE DATABASE ${db_name} OWNER ${db_user};
EOSQL
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
DO \$\$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN
CREATE ROLE "${db_user}" WITH LOGIN;
END IF;
END
\$\$;
CREATE DATABASE "${db_name}" OWNER "${db_user}";
EOSQL
# Set password separately to avoid SQL quoting issues with special characters
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" \
-c "ALTER ROLE \"${db_user}\" WITH PASSWORD $(printf '%q' "'${db_pass}'")"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/docker-compose/vchord/init-vchord.sh` around lines 42 - 51, The SQL is
vulnerable to breaking when db_user, db_pass, or db_name contain
quotes/semicolons; instead pass them into psql via -v and use psql substitution
forms so identifiers are double-quoted and passwords are safe: call psql -v
db_user="$db_user" -v db_pass="$db_pass" -v db_name="$db_name" and inside the
heredoc use :"db_user" and :"db_name" for identifiers and :'db_pass' (or
dollar-quoting via $$...$$) for the password literal in the DO $$ block and
CREATE ROLE statement; apply the same change to the later GRANT statements so
all occurrences of ${db_user}, ${db_pass}, ${db_name} are replaced with the
appropriate psql substitutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant