feat: multi-instance VectorChord docker-compose with shared PostgreSQL#6
feat: multi-instance VectorChord docker-compose with shared PostgreSQL#6
Conversation
Replace init-vchord.sql with init-vchord.sh shell script that handles extension installation (vchord, pg_tokenizer, vchord_bm25) and llmlingua2 tokenizer creation for both the default database and additional databases via EXTRA_DATABASES env var. - Add docker-compose.multi.yaml: two Hindsight API instances sharing one VectorChord PostgreSQL, each with isolated database/user/password, using DeepInfra embeddings/reranker via litellm-sdk - Update docker-compose.yaml: switch to fork image (ghcr.io/franchb/hindsight-api), remove vectorchord-init sidecar, add healthcheck with service_healthy dependency - Update .env.example with litellm-sdk provider examples for DeepInfra - Update CLAUDE.md with correct Qwen3 model names and routing prefixes - Add VectorChord deployment section to README.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe pull request extends VectorChord support by refactoring Docker Compose configurations, introducing a multi-instance deployment setup, adding a PostgreSQL initialization script for VectorChord extensions, and updating environment documentation to support multiple embedding and reranker providers including litellm-sdk integration. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
docker/docker-compose/vchord/docker-compose.multi.yaml (1)
43-71: Consider using YAML anchors to reduce duplication between agent services.The two agent service definitions are nearly identical, differing only in container name, port, and database URL. YAML anchors (
&/*/<<:) could reduce the duplication, though readability is a valid trade-off for a demo compose file.Also applies to: 73-101
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/docker-compose/vchord/docker-compose.multi.yaml` around lines 43 - 71, Create a YAML anchor for the shared service configuration (e.g., an anchor named hindsight_agent_base) that contains the common keys used by hindsight-agent1 (image, environment entries except the instance-specific ones, networks, depends_on) and then update the hindsight-agent1 service to merge in that anchor using <<: *hindsight_agent_base while overriding the unique fields (container_name, ports, and HINDSIGHT_API_DATABASE_URL). Repeat the same pattern for the second agent service(s) (the block that differs only by port/container/database) so they each inherit the anchor and only specify their instance-specific overrides.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/docker-compose/vchord/docker-compose.multi.yaml`:
- Around line 27-31: The docker-compose snippet hardcodes passwords in
POSTGRES_PASSWORD and in the EXTRA_DATABASES string (and corresponding
DATABASE_URL entries), so change these to use environment variable substitution
(e.g., replace hindsight_admin_pass, agent1_pass, agent2_pass with
${POSTGRES_PASSWORD_ADMIN:-hindsight_admin_pass},
${AGENT1_DB_PASS:-agent1_pass}, ${AGENT2_DB_PASS:-agent2_pass} respectively) and
update the EXTRA_DATABASES value and the DATABASE_URL entries (the DATABASE_URL
occurrences referenced in the review) to reference the same env vars so
credentials stay in sync and can be overridden at runtime; ensure you only
change the credential tokens (POSTGRES_PASSWORD, EXTRA_DATABASES, DATABASE_URL
values) and keep the current defaults as fallbacks using the ${VAR:-default}
form.
- Around line 25-26: The compose service exposes PostgreSQL with the ports
mapping "5436:5432" which conflicts with the same host port used in the
single-instance compose, so change the ports entry under the service (the ports:
mapping currently set to "5436:5432") to a non-conflicting host port such as
"5437:5432" or add a comment/docs note that the multi and single compose files
are mutually exclusive to avoid concurrent binding; update only the ports value
in docker-compose.multi.yaml (ports: -> "5437:5432") or add documentation
accordingly.
In `@docker/docker-compose/vchord/docker-compose.yaml`:
- Around line 7-14: The docs state HINDSIGHT_DB_PASSWORD is required but the
compose file supplies a fallback default (hindsight_password); make them
consistent by removing the fallback default from the docker-compose service
environment (remove the hindsight_password default variable) so the container
fails when HINDSIGHT_DB_PASSWORD is unset, or alternatively remove
HINDSIGHT_DB_PASSWORD from the "Required" header and list it as optional with
its default; locate the environment block that sets HINDSIGHT_DB_PASSWORD /
hindsight_password to change the default behavior accordingly.
In `@docker/docker-compose/vchord/init-vchord.sh`:
- Around line 42-51: The SQL is vulnerable to breaking when db_user, db_pass, or
db_name contain quotes/semicolons; instead pass them into psql via -v and use
psql substitution forms so identifiers are double-quoted and passwords are safe:
call psql -v db_user="$db_user" -v db_pass="$db_pass" -v db_name="$db_name" and
inside the heredoc use :"db_user" and :"db_name" for identifiers and :'db_pass'
(or dollar-quoting via $$...$$) for the password literal in the DO $$ block and
CREATE ROLE statement; apply the same change to the later GRANT statements so
all occurrences of ${db_user}, ${db_pass}, ${db_name} are replaced with the
appropriate psql substitutions.
---
Nitpick comments:
In `@docker/docker-compose/vchord/docker-compose.multi.yaml`:
- Around line 43-71: Create a YAML anchor for the shared service configuration
(e.g., an anchor named hindsight_agent_base) that contains the common keys used
by hindsight-agent1 (image, environment entries except the instance-specific
ones, networks, depends_on) and then update the hindsight-agent1 service to
merge in that anchor using <<: *hindsight_agent_base while overriding the unique
fields (container_name, ports, and HINDSIGHT_API_DATABASE_URL). Repeat the same
pattern for the second agent service(s) (the block that differs only by
port/container/database) so they each inherit the anchor and only specify their
instance-specific overrides.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.env.exampleCLAUDE.mdREADME.mddocker/docker-compose/vchord/docker-compose.multi.yamldocker/docker-compose/vchord/docker-compose.yamldocker/docker-compose/vchord/init-vchord.sh
| ports: | ||
| - "5436:5432" |
There was a problem hiding this comment.
Potential host-port conflict with single-instance compose.
Both docker-compose.yaml and docker-compose.multi.yaml bind host port 5436 for PostgreSQL. Running both simultaneously will cause a bind error. Consider using a different host port (e.g., 5437) or documenting that they are mutually exclusive.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/docker-compose/vchord/docker-compose.multi.yaml` around lines 25 - 26,
The compose service exposes PostgreSQL with the ports mapping "5436:5432" which
conflicts with the same host port used in the single-instance compose, so change
the ports entry under the service (the ports: mapping currently set to
"5436:5432") to a non-conflicting host port such as "5437:5432" or add a
comment/docs note that the multi and single compose files are mutually exclusive
to avoid concurrent binding; update only the ports value in
docker-compose.multi.yaml (ports: -> "5437:5432") or add documentation
accordingly.
| environment: | ||
| POSTGRES_USER: hindsight_admin | ||
| POSTGRES_PASSWORD: hindsight_admin_pass | ||
| POSTGRES_DB: hindsight_default | ||
| EXTRA_DATABASES: hindsight_agent1:agent1_user:agent1_pass,hindsight_agent2:agent2_user:agent2_pass |
There was a problem hiding this comment.
Hardcoded credentials — consider using environment variable substitution.
Unlike docker-compose.yaml which uses ${VAR:-default} substitution for credentials, this file hardcodes all passwords (hindsight_admin_pass, agent1_pass, agent2_pass) directly. This creates two problems:
- Credentials must be kept in sync between
EXTRA_DATABASES(line 31) andDATABASE_URL(lines 53, 83) — a mismatch silently breaks connectivity. - Users who deploy this without modification are running with well-known passwords.
Consider parameterizing at least the passwords via env vars (e.g., ${AGENT1_DB_PASS:-agent1_pass}) to make it easy to override while keeping the current values as convenient defaults.
Sketch of parameterized credentials
environment:
POSTGRES_USER: hindsight_admin
- POSTGRES_PASSWORD: hindsight_admin_pass
+ POSTGRES_PASSWORD: ${HINDSIGHT_ADMIN_DB_PASSWORD:-hindsight_admin_pass}
POSTGRES_DB: hindsight_default
- EXTRA_DATABASES: hindsight_agent1:agent1_user:agent1_pass,hindsight_agent2:agent2_user:agent2_pass
+ EXTRA_DATABASES: "hindsight_agent1:agent1_user:${AGENT1_DB_PASS:-agent1_pass},hindsight_agent2:agent2_user:${AGENT2_DB_PASS:-agent2_pass}"And correspondingly for the DATABASE_URL lines:
- HINDSIGHT_API_DATABASE_URL: postgresql://agent1_user:agent1_pass@db:5432/hindsight_agent1
+ HINDSIGHT_API_DATABASE_URL: postgresql://agent1_user:${AGENT1_DB_PASS:-agent1_pass}@db:5432/hindsight_agent1- HINDSIGHT_API_DATABASE_URL: postgresql://agent2_user:agent2_pass@db:5432/hindsight_agent2
+ HINDSIGHT_API_DATABASE_URL: postgresql://agent2_user:${AGENT2_DB_PASS:-agent2_pass}@db:5432/hindsight_agent2Also applies to: 53-53, 83-83
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/docker-compose/vchord/docker-compose.multi.yaml` around lines 27 - 31,
The docker-compose snippet hardcodes passwords in POSTGRES_PASSWORD and in the
EXTRA_DATABASES string (and corresponding DATABASE_URL entries), so change these
to use environment variable substitution (e.g., replace hindsight_admin_pass,
agent1_pass, agent2_pass with ${POSTGRES_PASSWORD_ADMIN:-hindsight_admin_pass},
${AGENT1_DB_PASS:-agent1_pass}, ${AGENT2_DB_PASS:-agent2_pass} respectively) and
update the EXTRA_DATABASES value and the DATABASE_URL entries (the DATABASE_URL
occurrences referenced in the review) to reference the same env vars so
credentials stay in sync and can be overridden at runtime; ensure you only
change the credential tokens (POSTGRES_PASSWORD, EXTRA_DATABASES, DATABASE_URL
values) and keep the current defaults as fallbacks using the ${VAR:-default}
form.
| # Required environment variables (set in .env or shell): | ||
| # HINDSIGHT_DB_PASSWORD - Password for the PostgreSQL user | ||
| # Configure LLM provider variables as needed (see hindsight service below) | ||
| # | ||
| # Optional environment variables with defaults: | ||
| # - HINDSIGHT_VERSION: Hindsight application version (default: latest) | ||
| # - HINDSIGHT_DB_USER: PostgreSQL user (default: hindsight_user) | ||
| # - HINDSIGHT_DB_NAME: PostgreSQL database name (default: hindsight_db) | ||
| # - HINDSIGHT_DB_VERSION: PostgreSQL version (default: 18) | ||
| # HINDSIGHT_VERSION - Hindsight image tag (default: latest-slim) | ||
| # HINDSIGHT_DB_USER - PostgreSQL user (default: hindsight_user) | ||
| # HINDSIGHT_DB_NAME - PostgreSQL database name (default: hindsight_db) |
There was a problem hiding this comment.
Documentation says HINDSIGHT_DB_PASSWORD is required, but line 25 provides a default.
The header comments list HINDSIGHT_DB_PASSWORD as a required environment variable, but the compose file provides hindsight_password as a default fallback (line 25). Either remove it from the "Required" section or drop the default so the container fails explicitly when the password isn't set.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/docker-compose/vchord/docker-compose.yaml` around lines 7 - 14, The
docs state HINDSIGHT_DB_PASSWORD is required but the compose file supplies a
fallback default (hindsight_password); make them consistent by removing the
fallback default from the docker-compose service environment (remove the
hindsight_password default variable) so the container fails when
HINDSIGHT_DB_PASSWORD is unset, or alternatively remove HINDSIGHT_DB_PASSWORD
from the "Required" header and list it as optional with its default; locate the
environment block that sets HINDSIGHT_DB_PASSWORD / hindsight_password to change
the default behavior accordingly.
| psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL | ||
| DO \$\$ | ||
| BEGIN | ||
| IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN | ||
| CREATE ROLE ${db_user} WITH LOGIN PASSWORD '${db_pass}'; | ||
| END IF; | ||
| END | ||
| \$\$; | ||
| CREATE DATABASE ${db_name} OWNER ${db_user}; | ||
| EOSQL |
There was a problem hiding this comment.
SQL injection / special-character breakage in interpolated SQL.
db_user, db_pass, and db_name are interpolated directly into SQL without quoting or escaping. While these values come from the docker-compose environment (not end-user input), a password or name containing a single quote (') or semicolon will break or corrupt the SQL statements. Identifiers should be double-quoted, and passwords should use escaped literals or be set separately.
Proposed safer approach (wrap identifiers, use separate password setting)
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
DO \$\$
BEGIN
- IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN
- CREATE ROLE ${db_user} WITH LOGIN PASSWORD '${db_pass}';
+ IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN
+ CREATE ROLE "${db_user}" WITH LOGIN;
END IF;
END
\$\$;
- CREATE DATABASE ${db_name} OWNER ${db_user};
+ CREATE DATABASE "${db_name}" OWNER "${db_user}";
EOSQL
+
+ # Set password separately to avoid SQL quoting issues with special characters
+ psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" \
+ -c "ALTER ROLE \"${db_user}\" WITH PASSWORD $(printf '%q' "'${db_pass}'")"Note: The same unquoted interpolation pattern applies to the GRANT statements on lines 57–62.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL | |
| DO \$\$ | |
| BEGIN | |
| IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN | |
| CREATE ROLE ${db_user} WITH LOGIN PASSWORD '${db_pass}'; | |
| END IF; | |
| END | |
| \$\$; | |
| CREATE DATABASE ${db_name} OWNER ${db_user}; | |
| EOSQL | |
| psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL | |
| DO \$\$ | |
| BEGIN | |
| IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '${db_user}') THEN | |
| CREATE ROLE "${db_user}" WITH LOGIN; | |
| END IF; | |
| END | |
| \$\$; | |
| CREATE DATABASE "${db_name}" OWNER "${db_user}"; | |
| EOSQL | |
| # Set password separately to avoid SQL quoting issues with special characters | |
| psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" \ | |
| -c "ALTER ROLE \"${db_user}\" WITH PASSWORD $(printf '%q' "'${db_pass}'")" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/docker-compose/vchord/init-vchord.sh` around lines 42 - 51, The SQL is
vulnerable to breaking when db_user, db_pass, or db_name contain
quotes/semicolons; instead pass them into psql via -v and use psql substitution
forms so identifiers are double-quoted and passwords are safe: call psql -v
db_user="$db_user" -v db_pass="$db_pass" -v db_name="$db_name" and inside the
heredoc use :"db_user" and :"db_name" for identifiers and :'db_pass' (or
dollar-quoting via $$...$$) for the password literal in the DO $$ block and
CREATE ROLE statement; apply the same change to the later GRANT statements so
all occurrences of ${db_user}, ${db_pass}, ${db_name} are replaced with the
appropriate psql substitutions.
Summary
init-vchord.sh: Replacesinit-vchord.sqlwith a shell script that installs VectorChord extensions (vchord,pg_tokenizer,vchord_bm25) and creates thellmlingua2tokenizer in both the default database and any additional databases defined viaEXTRA_DATABASESenv var (dbname:user:password,...)docker-compose.multi.yaml: Two Hindsight API instances sharing one VectorChord PostgreSQL container, each with its own isolated database/user/password, using DeepInfra embeddings and reranker vialitellm-sdkdocker-compose.yaml: Switched to fork image (ghcr.io/franchb/hindsight-api:latest-slim), removedvectorchord-initsidecar, added DB healthcheck withservice_healthydependency, removed port 9999 mapping.env.example: Addedlitellm-sdkprovider examples for DeepInfra embeddings and rerankerCLAUDE.md: Corrected Qwen3 model names andopenai/routing prefix for embeddingsREADME.md: Added "Docker with VectorChord" section to Quick StartTest plan
docker compose -f docker/docker-compose/vchord/docker-compose.yaml configvalidates without errorsdocker compose -f docker/docker-compose/vchord/docker-compose.multi.yaml configvalidates without errorsdocker compose -f docker/docker-compose/vchord/docker-compose.yaml up -dstarts DB + Hindsight, extensions installed via init scriptdocker compose -f docker/docker-compose/vchord/docker-compose.multi.yaml up -dstarts DB + 2 agents on ports 8881/8882docker exec hindsight-multi-db psql -U hindsight_admin -d hindsight_agent1 -c "SELECT extname FROM pg_extension;"curl localhost:8881/healthandcurl localhost:8882/health🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation