Skip to content

Auto-validate MinIO/S3 staging accessibility FROM ClickHouse during Peer validation and Mirror creation #4152

Description

@nareshntr

Summary

When setting up a ClickHouse peer and creating a CDC mirror (Postgres → ClickHouse),
PeerDB currently only validates that:

  • PeerDB server can connect to ClickHouse ✅
  • PeerDB server can write to MinIO/S3 bucket ✅

But it does NOT validate that:

  • ClickHouse can reach and read from MinIO/S3 ❌
  • ClickHouse user has the required s3 grant ❌
  • ClickHouse user has the required CREATE TEMPORARY TABLE grant ❌

This missing validation causes mirrors to silently fail — the mirror creates
the table in ClickHouse but the snapshot stays stuck at 0 rows indefinitely
with no actionable error message anywhere in the UI.


Environment

  • PeerDB version: stable-v0.36.16
  • Deployment: Docker Compose (OSS self-hosted)
  • Source: PostgreSQL 17 (Primary + Replica)
  • Destination: ClickHouse (external VM, outside Docker network)
  • Staging: MinIO (bundled inside PeerDB Docker stack)
  • MinIO endpoint: http://<MINIO_HOST_IP>:9001

Current Behavior

  1. Create ClickHouse peer → Validation passes ✅
  2. Click "Validate Mirror" → Passes ✅
  3. Click "Create Mirror" → Mirror created successfully ✅
  4. Initial Copy tab shows table stuck at 0 / 1 partitions,
    0 rows, status Syncing forever ❌
  5. No error shown anywhere in PeerDB UI ❌
  6. Overview tab shows status: Snapshot indefinitely ❌
  7. Worker logs only show a downstream symptom:
{
  "msg": "failed to get interval since last normalize",
  "error": "no rows in result set"
}
{
  "msg": "interval since last normalize is nil"
}

These log messages are misleading — they are a downstream symptom of
ClickHouse never receiving any data from MinIO, not the root cause.

  1. After a PeerDB version upgrade (without pausing mirrors first),
    the snapshot workflow additionally shows Temporal non-determinism panics:

PeerDB validates the PeerDB → MinIO and PeerDB → ClickHouse legs,
but NEVER validates the critical ClickHouse → MinIO leg.

When ClickHouse runs on an external host outside the Docker network
(the most common OSS production setup), it requires:

  1. Network access: ClickHouse host must reach http://<MINIO_IP>:9001

    • Firewall must allow ClickHouse host → PeerDB host on port 9001
    • ENDPOINT_URL_S3 in .env must be a real IP reachable by ClickHouse,
      NOT localhost, 127.0.0.1, or a Docker-internal hostname like minio
  2. ClickHouse user privileges:

   GRANT CREATE TEMPORARY TABLE, s3 ON *.* TO peerdb_user;
   GRANT ALTER ADD COLUMN ON <database>.* TO peerdb_user;

If either of these is missing, the mirror silently stalls at 0 rows
with no actionable feedback. This is extremely difficult to diagnose
and has caused multiple hours of debugging for OSS users.


Impact

  • Affects all OSS self-hosted users with ClickHouse on a separate VM
    (the most common production deployment)
  • Completely silent failure — no error in UI, no clear error in logs
  • The "Validate Mirror" button passes even when ClickHouse cannot reach
    MinIO, giving a false sense of confidence
  • Users waste hours debugging Temporal workflow logs, PostgreSQL
    replication settings, and ClickHouse configs before finding the
    real cause

Proposed Fix

1. During ClickHouse Peer Creation — "Validate" button

When the user clicks "Validate" on the ClickHouse peer creation form,
add a connectivity test that executes FROM ClickHouse TO MinIO/S3:

-- Execute this on ClickHouse during peer validation
SELECT * FROM s3(
  'http://<MINIO_ENDPOINT>/peerdbbucket/peerdb_peer_preflight_check.csv',
  '<ACCESS_KEY>',
  '<SECRET_KEY>',
  'CSV'
) LIMIT 0;

Also check required privileges exist on the ClickHouse user:

-- Check s3 privilege
SELECT count(*) FROM system.grants
WHERE user_name = '<configured_user>'
AND access_type = 'S3';

-- Check CREATE TEMPORARY TABLE privilege
SELECT count(*) FROM system.grants
WHERE user_name = '<configured_user>'
AND access_type = 'CREATE TEMPORARY TABLE';

If any check fails, surface a clear, actionable error immediately:

❌ Peer validation failed: ClickHouse cannot reach MinIO/S3 staging
at http://<ENDPOINT>:9001.
Fix: Ensure firewall allows ClickHouse host → MinIO host on port 9001,
and set ENDPOINT_URL_S3 in .env to an IP reachable by ClickHouse.

❌ Peer validation failed: ClickHouse user <user> is missing the
S3 privilege required to read staged Avro files.
Fix: Run on ClickHouse:
GRANT CREATE TEMPORARY TABLE, s3 ON *.* TO <user>;


2. During Mirror Creation — "Validate Mirror" button

The existing "Validate Mirror" button should additionally run all
ClickHouse→MinIO checks described above before reporting success.
Currently it passes even when ClickHouse cannot reach MinIO.


3. During Mirror Creation — "Create Mirror" button (most important)

Regardless of whether the user clicked "Validate Mirror" or not,
clicking "Create Mirror" must automatically run a full preflight check
BEFORE creating the mirror. Mirror creation should be blocked if any
check fails.

Preflight Check 1 — PeerDB → MinIO write access

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions