-
Notifications
You must be signed in to change notification settings - Fork 346
DB Pipes: Docs on sync control, resync, initial load and more #4037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
### Pull batch size {#pull-batch-size} | ||
The pull batch size is the number of records that the ClickPipe will pull from the source database in one batch. Records mean inserts, updates and deletes done on the tables that are part of the pipe. | ||
|
||
The default is **100,000** records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call out a safe maximum, ~10 million for now
The MySQL ClickPipe uses a column on your source table to logically partition the source tables. This column is called the **partition key column**. It is used to divide the source table into partitions, which can then be processed in parallel by the ClickPipe. | ||
|
||
:::warning | ||
The partition key column must be indexed in the source table to see a good performance boost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to validate for an index on the partition column?
<img src={snapshot_params} alt="Snapshot parameters" /> | ||
|
||
#### Snapshot number of rows per partition {#snapshot-number-of-rows-per-partition} | ||
This setting controls how many rows constitute a partition. The ClickPipe will read the source table in chunks of this size, and each chunk will be processed in parallel. The default value is 100,000 rows per partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove "each" as it can give the impression of all chunks processing at once
"chunks will be processed in parallel"
### Monitoring parallel snapshot in Postgres {#monitoring-parallel-snapshot-in-postgres} | ||
You can analyze **pg_stat_activity** to see the parallel snapshot in action. The ClickPipe will create multiple connections to the source database, each reading a different partition of the source table. If you see **FETCH** queries with different CTID ranges, it means that the ClickPipe is reading the source tables. You can also see the COUNT(*) and the partitioning query in here. | ||
|
||
### Limitations {#limitations} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call out compressed hypertables here mayhaps?
Summary
This PR adds more documentation around DB pipes
Checklist