Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions docker/docker-compose/timescale/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Git
.git
.gitignore
.gitattributes

# Docker
docker-compose.yaml
.dockerignore

# Documentation
README.md
*.md

# Environment
.env
.env.example
25 changes: 25 additions & 0 deletions docker/docker-compose/timescale/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# PostgreSQL Configuration
HINDSIGHT_DB_USER=hindsight_user
HINDSIGHT_DB_PASSWORD=change-me-to-secure-password
HINDSIGHT_DB_NAME=hindsight_db

# Hindsight Version
HINDSIGHT_VERSION=latest

# LLM Configuration
HINDSIGHT_API_LLM_PROVIDER=openai
OPENAI_API_KEY=your-openai-api-key-here

# Alternative LLM providers (uncomment and configure as needed):
# HINDSIGHT_API_LLM_PROVIDER=anthropic
# ANTHROPIC_API_KEY=your-anthropic-api-key

# HINDSIGHT_API_LLM_PROVIDER=gemini
# GEMINI_API_KEY=your-gemini-api-key

# HINDSIGHT_API_LLM_PROVIDER=groq
# GROQ_API_KEY=your-groq-api-key

# Vector and Text Search (already configured in docker-compose.yaml)
# HINDSIGHT_API_VECTOR_EXTENSION=pgvectorscale
# HINDSIGHT_API_TEXT_SEARCH_EXTENSION=pg_textsearch
55 changes: 55 additions & 0 deletions docker/docker-compose/timescale/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# PostgreSQL with pgvector, pgvectorscale, and pg_textsearch extensions
# All three extensions from Timescale/pgvector for high-performance vector and text search
# Note: Requires PostgreSQL 16+
FROM postgres:17

# Install build dependencies and Rust toolchain
RUN apt-get update && apt-get install -y \
build-essential \
git \
postgresql-server-dev-17 \
libpq-dev \
cmake \
curl \
pkg-config \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*

# Install Rust toolchain (required for pgvectorscale)
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"

# Install pgvector (required by pgvectorscale)
RUN cd /tmp && \
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git && \
cd pgvector && \
make && \
make install && \
rm -rf /tmp/pgvector

# Install cargo-pgrx (PostgreSQL extension framework for Rust)
RUN cargo install cargo-pgrx --version 0.12.5 --locked && \
cargo pgrx init --pg17 /usr/bin/pg_config

# Install pgvectorscale (DiskANN index support)
RUN cd /tmp && \
git clone --branch 0.5.1 https://github.com/timescale/pgvectorscale.git && \
cd pgvectorscale/pgvectorscale && \
cargo pgrx install --release && \
rm -rf /tmp/pgvectorscale

# Install pg_textsearch (BM25 text search)
RUN cd /tmp && \
git clone https://github.com/timescale/pg_textsearch.git && \
cd pg_textsearch && \
make && \
make install && \
rm -rf /tmp/pg_textsearch

# Clean up build dependencies (keep runtime dependencies)
RUN apt-get purge -y --auto-remove git cmake curl && \
rm -rf /root/.cargo/registry /root/.cargo/git

# Ensure extensions are preloaded (pg_textsearch requires preloading)
RUN echo "shared_preload_libraries = 'pg_textsearch'" >> /usr/share/postgresql/postgresql.conf.sample

101 changes: 101 additions & 0 deletions docker/docker-compose/timescale/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Hindsight with Timescale Extensions

This Docker Compose setup provides a complete Hindsight deployment with **Timescale extensions**:
- **pgvectorscale** - DiskANN algorithm for disk-based scalable vector search
- **pg_textsearch** - High-performance BM25 text search

Both extensions are from [Timescale](https://github.com/timescale) and provide production-grade performance.

## Prerequisites

- Docker and Docker Compose installed
- OpenAI API key (or another LLM provider)

## Quick Start

```bash
# Set environment variables
export HINDSIGHT_DB_PASSWORD="your-secure-password"
export OPENAI_API_KEY="your-openai-api-key"

# Build and start
docker compose -f docker/docker-compose/timescale/docker-compose.yaml up -d --build

# Check logs

docker compose -f docker/docker-compose/timescale/docker-compose.yaml logs -f
```

**Access:**
- API: http://localhost:8888
- Control Plane: http://localhost:9999

## Stop and Clean Up

```bash
# Stop services
docker compose -f docker/docker-compose/timescale/docker-compose.yaml down

# Remove volumes (deletes all data)
docker compose -f docker/docker-compose/timescale/docker-compose.yaml down -v
```

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `HINDSIGHT_DB_PASSWORD` | PostgreSQL password | `hindsight_password` |
| `HINDSIGHT_DB_USER` | PostgreSQL username | `hindsight_user` |
| `HINDSIGHT_DB_NAME` | Database name | `hindsight_db` |
| `HINDSIGHT_VERSION` | Hindsight Docker image version | `latest` |
| `OPENAI_API_KEY` | OpenAI API key | (required) |
| `HINDSIGHT_API_LLM_PROVIDER` | LLM provider | `openai` |

### Why Timescale Extensions?

**pgvectorscale (DiskANN):**
- 28x lower p95 latency vs dedicated vector databases
- 16x higher query throughput at 99% recall
- 60-75% cost reduction (disk is cheaper than RAM)
- Best for large datasets (10M+ vectors)

**pg_textsearch (BM25):**
- High-performance keyword retrieval
- Native BM25 ranking algorithm
- Optimized for full-text search

## Troubleshooting

### Extensions not installed

Check if extensions are available:

```bash
docker exec -it hindsight-db-timescale psql -U hindsight_user -d hindsight_db -c "\dx"
```

You should see:
- `vector` (pgvector)
- `vectorscale` (pgvectorscale/DiskANN)
- `pg_textsearch` (BM25 search)

### Build fails

If the Docker build fails during pgvectorscale compilation:

1. Ensure you have sufficient memory (recommended: 4GB+)
2. Check Docker build logs for Rust compilation errors
3. Try building with more resources: `docker compose build --no-cache --memory 4g`

### Port conflicts

If port 5438 is already in use, modify the `ports` section in docker-compose.yaml.

## Learn More

- [pgvectorscale GitHub](https://github.com/timescale/pgvectorscale)
- [pg_textsearch GitHub](https://github.com/timescale/pg_textsearch)
- [HNSW vs DiskANN](https://www.tigerdata.com/learn/hnsw-vs-diskann)
- [Hindsight Documentation](https://hindsight.dev)
108 changes: 108 additions & 0 deletions docker/docker-compose/timescale/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
name: hindsight
# Docker Compose file for Hindsight with Timescale extensions
# - pgvectorscale: DiskANN vector search (disk-based, scalable)
# - pg_textsearch: BM25 text search (high-performance keyword retrieval)
#
# Quick start:
# docker compose -f docker/docker-compose/timescale/docker-compose.yaml up -d --build
#
# Required environment variables:
# - HINDSIGHT_DB_PASSWORD: Password for the PostgreSQL user
# - OPENAI_API_KEY (or configure another LLM provider)
#
# Optional environment variables with defaults:
# - HINDSIGHT_VERSION: Hindsight application version (default: latest)
# - HINDSIGHT_DB_USER: PostgreSQL user (default: hindsight_user)
# - HINDSIGHT_DB_NAME: PostgreSQL database name (default: hindsight_db)

services:
db:
# Custom PostgreSQL image with Timescale extensions (pgvectorscale + pg_textsearch)
build:
context: .
dockerfile: Dockerfile
container_name: hindsight-db-timescale
restart: always
# Expose PostgreSQL port (using 5438 to avoid conflicts with other setups)
ports:
- "5438:5432"
environment:
POSTGRES_USER: ${HINDSIGHT_DB_USER:-hindsight_user}
POSTGRES_PASSWORD: ${HINDSIGHT_DB_PASSWORD:-hindsight_password}
POSTGRES_DB: ${HINDSIGHT_DB_NAME:-hindsight_db}
volumes:
- pg_data:/var/lib/postgresql/data
networks:
- hindsight-net
# Health check to ensure database is ready
healthcheck:
test: ["CMD-SHELL", "pg_isready -U hindsight_user"]
interval: 5s
timeout: 5s
retries: 5

timescale-init:
build:
context: .
dockerfile: Dockerfile
depends_on:
db:
condition: service_healthy
environment:
- PGPASSWORD=${HINDSIGHT_DB_PASSWORD:-hindsight_password}
command: >
bash -c "
echo 'PostgreSQL is ready - creating hindsight_db database';
psql -h hindsight-db-timescale -p 5432 -U hindsight_user -c 'CREATE DATABASE hindsight_db;' 2>/dev/null || echo 'Database already exists';
echo 'Installing Timescale extensions...';
echo '1/3: Installing pgvector (required by pgvectorscale)...';
psql -h hindsight-db-timescale -p 5432 -U hindsight_user -d hindsight_db -c 'CREATE EXTENSION IF NOT EXISTS vector CASCADE;';
echo '2/3: Installing pgvectorscale (DiskANN vector search)...';
psql -h hindsight-db-timescale -p 5432 -U hindsight_user -d hindsight_db -c 'CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;';
echo '3/3: Installing pg_textsearch (BM25 text search)...';
psql -h hindsight-db-timescale -p 5432 -U hindsight_user -d hindsight_db -c 'CREATE EXTENSION IF NOT EXISTS pg_textsearch CASCADE;';
echo '';
echo '✅ Timescale extensions installed successfully';
echo '';
echo 'Installed extensions:';
psql -h hindsight-db-timescale -p 5432 -U hindsight_user -d hindsight_db -c \"\\dx\" | grep -E '(vector|vectorscale|pg_textsearch)';
"
restart: "no"
networks:
- hindsight-net

hindsight:
image: ghcr.io/vectorize-io/hindsight:${HINDSIGHT_VERSION:-latest}
container_name: hindsight-app-timescale
ports:
- "8888:8888"
- "9999:9999"
environment:
# LLM Configuration
HINDSIGHT_API_LLM_PROVIDER: ${HINDSIGHT_API_LLM_PROVIDER:-openai}
HINDSIGHT_API_LLM_API_KEY: ${OPENAI_API_KEY:-your-api-key}

# Database Configuration
HINDSIGHT_API_DATABASE_URL: postgresql://${HINDSIGHT_DB_USER:-hindsight_user}:${HINDSIGHT_DB_PASSWORD:-hindsight_password}@db:5432/${HINDSIGHT_DB_NAME:-hindsight_db}

# Timescale Extensions
# pgvectorscale: DiskANN algorithm for disk-based scalable vector search
HINDSIGHT_API_VECTOR_EXTENSION: pgvectorscale
# pg_textsearch: High-performance BM25 text search
HINDSIGHT_API_TEXT_SEARCH_EXTENSION: pg_textsearch

depends_on:
db:
condition: service_healthy
timescale-init:
condition: service_completed_successfully
networks:
- hindsight-net


networks:
hindsight-net:
driver: bridge

volumes:
pg_data:
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,27 @@

def _detect_vector_extension() -> str:
"""
Detect or validate vector extension: 'vchord' or 'pgvector'.
Detect or validate vector extension: 'pgvector', 'vchord', or 'pgvectorscale'.
Respects HINDSIGHT_API_VECTOR_EXTENSION env var if set.
"""
conn = op.get_bind()
vector_extension = os.getenv("HINDSIGHT_API_VECTOR_EXTENSION", "pgvector").lower()

# Validate configured extension is installed
if vector_extension == "vchord":
if vector_extension == "pgvectorscale":
# pgvectorscale requires pgvector
pgvector_check = conn.execute(text("SELECT 1 FROM pg_extension WHERE extname = 'vector'")).scalar()
if not pgvector_check:
raise RuntimeError(
"pgvectorscale requires pgvector. Install with: CREATE EXTENSION vector; CREATE EXTENSION vectorscale CASCADE;"
)
vectorscale_check = conn.execute(text("SELECT 1 FROM pg_extension WHERE extname = 'vectorscale'")).scalar()
if not vectorscale_check:
raise RuntimeError(
"Configured vector extension 'pgvectorscale' not found. Install it with: CREATE EXTENSION vectorscale CASCADE;"
)
return "pgvectorscale"
elif vector_extension == "vchord":
vchord_check = conn.execute(text("SELECT 1 FROM pg_extension WHERE extname = 'vchord'")).scalar()
if not vchord_check:
raise RuntimeError(
Expand All @@ -46,7 +59,9 @@ def _detect_vector_extension() -> str:
)
return "pgvector"
else:
raise ValueError(f"Invalid HINDSIGHT_API_VECTOR_EXTENSION: {vector_extension}. Must be 'pgvector' or 'vchord'")
raise ValueError(
f"Invalid HINDSIGHT_API_VECTOR_EXTENSION: {vector_extension}. Must be 'pgvector', 'vchord', or 'pgvectorscale'"
)


def _detect_text_search_extension() -> str:
Expand Down Expand Up @@ -289,7 +304,14 @@ def upgrade() -> None:
# Create vector index - conditional based on available extension
vector_ext = _detect_vector_extension()

if vector_ext == "vchord":
if vector_ext == "pgvectorscale":
# Use DiskANN index for pgvectorscale (disk-based, scalable)
op.execute("""
CREATE INDEX idx_memory_units_embedding ON memory_units
USING diskann (embedding vector_cosine_ops)
WITH (num_neighbors = 50)
""")
elif vector_ext == "vchord":
# Use vchordrq index for vchord (supports high-dimensional embeddings)
op.execute("""
CREATE INDEX idx_memory_units_embedding ON memory_units
Expand Down
Loading
Loading