Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 30 additions & 3 deletions docs/cli/clone.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Clone Command
description: Command for cloning Xata databases and external PostgreSQL databases
description: "Commands for managing database streaming (logical replication) operations"
---

The `clone` command helps you create a copy of your Xata database or clone an external PostgreSQL database into Xata. It supports data anonymization and advanced configuration for complex migration scenarios.
Expand All @@ -9,7 +9,7 @@ The `clone` command helps you create a copy of your Xata database or clone an ex

### start

Snapshot performs a snapshot of the configured source Postgres database into the configured target.
Start performs a snapshot of the configured source Postgres database into the configured target.

```bash
xata clone start [--source-url <url>] [--config <file>] [--log-level <level>] [--dump-file <file>] [--postgres-url <url>] [--profile] [--reset] [--tables <tables>] [--target <type>] [--target-url <url>] [--organization <id>] [--project <id>] [--branch <id>] [--filter-tables <tables>] [--validation-mode <mode>] [--role <role>] [-h|--help]
Expand All @@ -35,7 +35,7 @@ xata clone start [--source-url <url>] [--config <file>] [--log-level <level>] [-

### config

Automatically configure the transforms for the clone command.
Automatically configure the transformations for the clone command.

```bash
xata clone config [--source-url <url>] [--mode <mode>] [--validation-mode <mode>] [--organization <id>] [--project <id>] [--branch <id>] [-h|--help]
Expand All @@ -49,6 +49,33 @@ xata clone config [--source-url <url>] [--mode <mode>] [--validation-mode <mode>
- `--branch`: Branch ID (default: "")
- `-h, --help`: Print help information and exit

### stream

Start a continuous data stream from the configured source to the configured target using Postgres's logical replication.

```bash
xata clone stream --source-url <url> [--config <file>] [--log-level <level>] [--init] [--profile] [--replication-slot <name>] [--reset] [--snapshot-tables <tables>] [--source <type>] [--target <type>] [--target-url <url>] [--organization <id>] [--project <id>] [--branch <id>] [--filter-tables <tables>] [--validation-mode <mode>] [--role <role>] [-h|--help]
```

- `--source-url`: The source URL of the database to stream from (required)
- `--config`: .env or .yaml config file to use with pgstream if any
- `--log-level`: Log level for pgstream (trace|debug|info|warn|error|fatal|panic, default: info)
- `--init`: Whether to initialize pgstream before starting replication
- `--profile`: Whether to expose a /debug/pprof endpoint on localhost:6060
- `--replication-slot`: Name of the postgres replication slot for pgstream to connect to
- `--reset`: Whether to reset the target before snapshotting (only for postgres target)
- `--snapshot-tables`: List of tables to snapshot if initial snapshot is required, in the format `<schema>.<table>`. If not specified, the schema `public` will be assumed. Wildcards are supported
- `--source`: Source type. One of postgres, kafka
- `--target`: Target type. One of postgres, opensearch, elasticsearch, kafka
- `--target-url`: Target URL
- `--organization`: Organization ID
- `--project`: Project ID
- `--branch`: Branch ID
- `--filter-tables`: Tables to filter (default: _._)
- `--validation-mode`: Anonymization validation mode, strict implies that all tables and columns should be specified (strict|relaxed|prompt, default: prompt)
- `--role`: Postgres role to use for streaming (it should have atleast REPLICATION privilege)
- `-h, --help`: Print help information and exit

## Global Flags

- `-h, --help` - Print help information and exit
Expand Down
28 changes: 28 additions & 0 deletions docs/cli/stream.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: Stream Command
description: Commands for managing logical streaming replication operations
---

The stream command helps you manage database streaming operations with `pgstream`, using logical replication.

## Subcommands

### destroy

Destroy any pgstream setup, removing the replication slot and all the relevant tables/functions/triggers, along with the internal pgstream schema.

```bash
xata stream destroy --source-url <url> [--config <file>] [--log-level <level>] [--postgres-url <url>] [--replication-slot <name>] [-h|--help]
```

- `--source-url`: The source URL of the database to clone (required)
- `--config`: .env or .yaml config file to use with pgstream if any
- `--log-level`: Log level for pgstream (trace|debug|info|warn|error|fatal|panic, default: info)
- `--postgres-url`: Source postgres URL where pgstream destroy will be run
- `--replication-slot`: Name of the postgres replication slot to be deleted by pgstream from the source url
- `-h, --help`: Print help information and exit

## Global Flags

- `-h, --help` - Print help information and exit
- `--json` - Output in JSON format
10 changes: 10 additions & 0 deletions docs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@
"href": "/tutorials/create-staging-replica",
"file": "docs/tutorials/create-staging-replica.mdx"
},
{
"title": "Set up streaming replication",
"href": "/tutorials/streaming-replication",
"file": "docs/tutorials/streaming-replication.mdx"
},
{
"title": "Schema changes",
"href": "/tutorials/schema-change",
Expand Down Expand Up @@ -288,6 +293,11 @@
"href": "/cli/status",
"file": "docs/cli/status.mdx"
},
{
"title": "stream",
"href": "/cli/stream",
"file": "docs/cli/stream.mdx"
},
{
"title": "upgrade",
"href": "/cli/upgrade",
Expand Down
162 changes: 162 additions & 0 deletions docs/tutorials/streaming-replication.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
---
title: Set up a logical streaming replica
description: Use Xata's streaming replication to keep your database continuously synchronized with real-time changes.
---

This guide shows you how to set up continuous logical streaming replication from your production PostgreSQL database to Xata, enabling real-time data synchronization with optional anonymization.

![Setting up streaming replication to Xata](assets/images/xata-streaming-replication.png)

## 1. Prerequisites

- A Xata account ([sign up here](https://console.xata.io))
- The [Xata CLI](/cli) installed:
```bash
curl -fsSL https://xata.io/install.sh | bash
```
- A PostgreSQL database with:
- Logical replication enabled
- Role with permissions to create a replication slow (`xata clone stream` command does that automatically)
- Network connectivity from Xata to your database

## 2. Enable logical replication on source database

First, ensure your source PostgreSQL database has logical replication enabled. You'll need to set these parameters:

```sql
-- Check current settings
SHOW wal_level;
SHOW max_replication_slots;
SHOW max_wal_senders;
```

If not already configured, update your PostgreSQL configuration:

```sql
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM SET max_replication_slots = 10;
ALTER SYSTEM SET max_wal_senders = 10;
```

Restart your PostgreSQL instance for the changes to take effect.

## 3. Create a Xata project and branch

In the Console, create a new project and then click the **Create main branch** button to create the PostgreSQL instance.

For streaming replication, consider using at least 1 replica to ensure high availability during continuous synchronization. Select an instance size that can handle your expected write throughput.

> **Note:** Streaming replication maintains a persistent connection to your source database. Ensure your network allows stable, long-lived connections between Xata and your PostgreSQL instance.

## 4. Configure the Xata CLI

Authenticate the CLI by running:

```sh
xata auth login
```

Initialize the project by running:

```sh
xata init
```

## 5. Configure streaming replication

Generate a configuration for the streaming process:

```bash
xata clone config --source-url $CONN_STRING
```

Where `CONN_STRING` is your PostgreSQL connection string with replication permissions.

The configuration prompt will ask you to:

- Select tables to replicate
- Set up transformation pipelines i.e. anonymization rules

This creates a configuration file at `.xata/clone.yaml` that you can further customize.

## 6. Initialize and start streaming

```bash
xata clone stream --source-url $CONN_STRING
```

This command will:

- Create an initial snapshot of your specified tables
- Set up the streaming pipeline
- Begin continuous replication

## 7. Advanced configuration

### Filtering specific tables

To stream only specific tables, use the `--filter-tables` flag:

```bash
xata clone stream --source-url $CONN_STRING \
--filter-tables "users.*,orders.*,products.*"
```

If this option is not specified it defaults to `*.*`

### Custom transformations

Edit your `.xata/clone.yaml` file to add custom transformations:

```yaml
transforms:
- table: users
columns:
- name: email
transformer: mask_email
- name: phone
transformer: redact
- table: orders
columns:
- name: credit_card
transformer: mask_credit_card
```

### Running with Docker

For production deployments, consider running the streaming process in a containerized environment:

```bash
docker run -d \
--name xata-stream \
--restart unless-stopped \
-v $(pwd)/.xata:/config \
xata/cli clone stream \
--source-url $CONN_STRING
```

## 10. Handling failures and recovery

If the streaming connection is interrupted, the replication slot ensures no data is lost. Simply restart the streaming command:

```bash
xata clone stream --source-url $CONN_STRING
```

The process will resume from where it left off, catching up with any changes that occurred during the downtime.
However, if the too much lag accumulates then the Postgres server might slow down as it has to do both catching up on the lag and its normal operations.

If you terminate the `xata clone stream` process and do not wish to run streaming replication again, clean up the replication slot and
other `pgstream` objects using `xata stream destroy` command.

Not cleaning up the replication slot will cause the WAL to be aggregated continuously and that would lead to full disk space. Use options like `max_slot_wal_keep_size`
to keep the max WAL size in check.

## Summary

- You now have real-time streaming replication (Postgres's logical replication) from your PostgreSQL database to Xata
- Changes in your source database are automatically synchronized
- Your data can be anonymized in transit using configurable transformers
- The replication slot ensures no data loss during network interruptions

For more details on advanced streaming configurations and monitoring, see the [clone command documentation](/cli/clone).