Skip to content

Commit 8ad43d8

Browse files
authored
stream: docs and tutorial for xata clone stream (#37)
* stream: docs for xata clone stream * stream: docs for xata clone stream * add streaming replication tutorial * docs feedback * add streaming replication tutorial * feedback * feedback * feedback * feedback * feedback
1 parent ad84ba5 commit 8ad43d8

File tree

4 files changed

+230
-3
lines changed

4 files changed

+230
-3
lines changed

docs/cli/clone.mdx

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Clone Command
3-
description: Command for cloning Xata databases and external PostgreSQL databases
3+
description: "Commands for managing database streaming (logical replication) operations"
44
---
55

66
The `clone` command helps you create a copy of your Xata database or clone an external PostgreSQL database into Xata. It supports data anonymization and advanced configuration for complex migration scenarios.
@@ -9,7 +9,7 @@ The `clone` command helps you create a copy of your Xata database or clone an ex
99

1010
### start
1111

12-
Snapshot performs a snapshot of the configured source Postgres database into the configured target.
12+
Start performs a snapshot of the configured source Postgres database into the configured target.
1313

1414
```bash
1515
xata clone start [--source-url <url>] [--config <file>] [--log-level <level>] [--dump-file <file>] [--postgres-url <url>] [--profile] [--reset] [--tables <tables>] [--target <type>] [--target-url <url>] [--organization <id>] [--project <id>] [--branch <id>] [--filter-tables <tables>] [--validation-mode <mode>] [--role <role>] [-h|--help]
@@ -35,7 +35,7 @@ xata clone start [--source-url <url>] [--config <file>] [--log-level <level>] [-
3535

3636
### config
3737

38-
Automatically configure the transforms for the clone command.
38+
Automatically configure the transformations for the clone command.
3939

4040
```bash
4141
xata clone config [--source-url <url>] [--mode <mode>] [--validation-mode <mode>] [--organization <id>] [--project <id>] [--branch <id>] [-h|--help]
@@ -49,6 +49,33 @@ xata clone config [--source-url <url>] [--mode <mode>] [--validation-mode <mode>
4949
- `--branch`: Branch ID (default: "")
5050
- `-h, --help`: Print help information and exit
5151

52+
### stream
53+
54+
Start a continuous data stream from the configured source to the configured target using Postgres's logical replication.
55+
56+
```bash
57+
xata clone stream --source-url <url> [--config <file>] [--log-level <level>] [--init] [--profile] [--replication-slot <name>] [--reset] [--snapshot-tables <tables>] [--source <type>] [--target <type>] [--target-url <url>] [--organization <id>] [--project <id>] [--branch <id>] [--filter-tables <tables>] [--validation-mode <mode>] [--role <role>] [-h|--help]
58+
```
59+
60+
- `--source-url`: The source URL of the database to stream from (required)
61+
- `--config`: .env or .yaml config file to use with pgstream if any
62+
- `--log-level`: Log level for pgstream (trace|debug|info|warn|error|fatal|panic, default: info)
63+
- `--init`: Whether to initialize pgstream before starting replication
64+
- `--profile`: Whether to expose a /debug/pprof endpoint on localhost:6060
65+
- `--replication-slot`: Name of the postgres replication slot for pgstream to connect to
66+
- `--reset`: Whether to reset the target before snapshotting (only for postgres target)
67+
- `--snapshot-tables`: List of tables to snapshot if initial snapshot is required, in the format `<schema>.<table>`. If not specified, the schema `public` will be assumed. Wildcards are supported
68+
- `--source`: Source type. One of postgres, kafka
69+
- `--target`: Target type. One of postgres, opensearch, elasticsearch, kafka
70+
- `--target-url`: Target URL
71+
- `--organization`: Organization ID
72+
- `--project`: Project ID
73+
- `--branch`: Branch ID
74+
- `--filter-tables`: Tables to filter (default: _._)
75+
- `--validation-mode`: Anonymization validation mode, strict implies that all tables and columns should be specified (strict|relaxed|prompt, default: prompt)
76+
- `--role`: Postgres role to use for streaming (it should have atleast REPLICATION privilege)
77+
- `-h, --help`: Print help information and exit
78+
5279
## Global Flags
5380

5481
- `-h, --help` - Print help information and exit

docs/cli/stream.mdx

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
title: Stream Command
3+
description: Commands for managing logical streaming replication operations
4+
---
5+
6+
The stream command helps you manage database streaming operations with `pgstream`, using logical replication.
7+
8+
## Subcommands
9+
10+
### destroy
11+
12+
Destroy any pgstream setup, removing the replication slot and all the relevant tables/functions/triggers, along with the internal pgstream schema.
13+
14+
```bash
15+
xata stream destroy --source-url <url> [--config <file>] [--log-level <level>] [--postgres-url <url>] [--replication-slot <name>] [-h|--help]
16+
```
17+
18+
- `--source-url`: The source URL of the database to clone (required)
19+
- `--config`: .env or .yaml config file to use with pgstream if any
20+
- `--log-level`: Log level for pgstream (trace|debug|info|warn|error|fatal|panic, default: info)
21+
- `--postgres-url`: Source postgres URL where pgstream destroy will be run
22+
- `--replication-slot`: Name of the postgres replication slot to be deleted by pgstream from the source url
23+
- `-h, --help`: Print help information and exit
24+
25+
## Global Flags
26+
27+
- `-h, --help` - Print help information and exit
28+
- `--json` - Output in JSON format

docs/config.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,11 @@
5252
"href": "/tutorials/create-staging-replica",
5353
"file": "docs/tutorials/create-staging-replica.mdx"
5454
},
55+
{
56+
"title": "Set up streaming replication",
57+
"href": "/tutorials/streaming-replication",
58+
"file": "docs/tutorials/streaming-replication.mdx"
59+
},
5560
{
5661
"title": "Schema changes",
5762
"href": "/tutorials/schema-change",
@@ -293,6 +298,11 @@
293298
"href": "/cli/status",
294299
"file": "docs/cli/status.mdx"
295300
},
301+
{
302+
"title": "stream",
303+
"href": "/cli/stream",
304+
"file": "docs/cli/stream.mdx"
305+
},
296306
{
297307
"title": "upgrade",
298308
"href": "/cli/upgrade",
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
title: Set up a logical streaming replica
3+
description: Use Xata's streaming replication to keep your database continuously synchronized with real-time changes.
4+
---
5+
6+
This guide shows you how to set up continuous logical streaming replication from your production PostgreSQL database to Xata, enabling real-time data synchronization with optional anonymization.
7+
8+
![Setting up streaming replication to Xata](assets/images/xata-streaming-replication.png)
9+
10+
## 1. Prerequisites
11+
12+
- A Xata account ([sign up here](https://console.xata.io))
13+
- The [Xata CLI](/cli) installed:
14+
```bash
15+
curl -fsSL https://xata.io/install.sh | bash
16+
```
17+
- A PostgreSQL database with:
18+
- Logical replication enabled
19+
- Role with permissions to create a replication slow (`xata clone stream` command does that automatically)
20+
- Network connectivity from Xata to your database
21+
22+
## 2. Enable logical replication on source database
23+
24+
First, ensure your source PostgreSQL database has logical replication enabled. You'll need to set these parameters:
25+
26+
```sql
27+
-- Check current settings
28+
SHOW wal_level;
29+
SHOW max_replication_slots;
30+
SHOW max_wal_senders;
31+
```
32+
33+
If not already configured, update your PostgreSQL configuration:
34+
35+
```sql
36+
ALTER SYSTEM SET wal_level = logical;
37+
ALTER SYSTEM SET max_replication_slots = 10;
38+
ALTER SYSTEM SET max_wal_senders = 10;
39+
```
40+
41+
Restart your PostgreSQL instance for the changes to take effect.
42+
43+
## 3. Create a Xata project and branch
44+
45+
In the Console, create a new project and then click the **Create main branch** button to create the PostgreSQL instance.
46+
47+
For streaming replication, consider using at least 1 replica to ensure high availability during continuous synchronization. Select an instance size that can handle your expected write throughput.
48+
49+
> **Note:** Streaming replication maintains a persistent connection to your source database. Ensure your network allows stable, long-lived connections between Xata and your PostgreSQL instance.
50+
51+
## 4. Configure the Xata CLI
52+
53+
Authenticate the CLI by running:
54+
55+
```sh
56+
xata auth login
57+
```
58+
59+
Initialize the project by running:
60+
61+
```sh
62+
xata init
63+
```
64+
65+
## 5. Configure streaming replication
66+
67+
Generate a configuration for the streaming process:
68+
69+
```bash
70+
xata clone config --source-url $CONN_STRING
71+
```
72+
73+
Where `CONN_STRING` is your PostgreSQL connection string with replication permissions.
74+
75+
The configuration prompt will ask you to:
76+
77+
- Select tables to replicate
78+
- Set up transformation pipelines i.e. anonymization rules
79+
80+
This creates a configuration file at `.xata/clone.yaml` that you can further customize.
81+
82+
## 6. Initialize and start streaming
83+
84+
```bash
85+
xata clone stream --source-url $CONN_STRING
86+
```
87+
88+
This command will:
89+
90+
- Create an initial snapshot of your specified tables
91+
- Set up the streaming pipeline
92+
- Begin continuous replication
93+
94+
## 7. Advanced configuration
95+
96+
### Filtering specific tables
97+
98+
To stream only specific tables, use the `--filter-tables` flag:
99+
100+
```bash
101+
xata clone stream --source-url $CONN_STRING \
102+
--filter-tables "users.*,orders.*,products.*"
103+
```
104+
105+
If this option is not specified it defaults to `*.*`
106+
107+
### Custom transformations
108+
109+
Edit your `.xata/clone.yaml` file to add custom transformations:
110+
111+
```yaml
112+
transforms:
113+
- table: users
114+
columns:
115+
- name: email
116+
transformer: mask_email
117+
- name: phone
118+
transformer: redact
119+
- table: orders
120+
columns:
121+
- name: credit_card
122+
transformer: mask_credit_card
123+
```
124+
125+
### Running with Docker
126+
127+
For production deployments, consider running the streaming process in a containerized environment:
128+
129+
```bash
130+
docker run -d \
131+
--name xata-stream \
132+
--restart unless-stopped \
133+
-v $(pwd)/.xata:/config \
134+
xata/cli clone stream \
135+
--source-url $CONN_STRING
136+
```
137+
138+
## 10. Handling failures and recovery
139+
140+
If the streaming connection is interrupted, the replication slot ensures no data is lost. Simply restart the streaming command:
141+
142+
```bash
143+
xata clone stream --source-url $CONN_STRING
144+
```
145+
146+
The process will resume from where it left off, catching up with any changes that occurred during the downtime.
147+
However, if the too much lag accumulates then the Postgres server might slow down as it has to do both catching up on the lag and its normal operations.
148+
149+
If you terminate the `xata clone stream` process and do not wish to run streaming replication again, clean up the replication slot and
150+
other `pgstream` objects using `xata stream destroy` command.
151+
152+
Not cleaning up the replication slot will cause the WAL to be aggregated continuously and that would lead to full disk space. Use options like `max_slot_wal_keep_size`
153+
to keep the max WAL size in check.
154+
155+
## Summary
156+
157+
- You now have real-time streaming replication (Postgres's logical replication) from your PostgreSQL database to Xata
158+
- Changes in your source database are automatically synchronized
159+
- Your data can be anonymized in transit using configurable transformers
160+
- The replication slot ensures no data loss during network interruptions
161+
162+
For more details on advanced streaming configurations and monitoring, see the [clone command documentation](/cli/clone).

0 commit comments

Comments
 (0)