DB Pipes: Docs on sync control, resync, initial load and more #4037

Amogh-Bharadwaj · 2025-07-07T20:34:01Z

Summary

This PR adds more documentation around DB pipes

Checklist

Delete items not relevant to your PR
URL changes should add a redirect to the old URL via https://github.com/ClickHouse/clickhouse-docs/blob/main/docusaurus.config.js
If adding a new integration page, also add an entry to the integrations list here: https://github.com/ClickHouse/clickhouse-docs/blob/main/docs/integrations/index.mdx

vercel · 2025-07-07T20:34:06Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
clickhouse-docs	❌ Failed (Inspect)		💬 Add feedback	Jul 9, 2025 1:56pm

3 Skipped Deployments

Name	Status	Preview	Updated (UTC)
clickhouse-docs-jp	⬜️ Ignored (Inspect)		Jul 9, 2025 1:56pm
clickhouse-docs-ru	⬜️ Ignored (Inspect)	Visit Preview	Jul 9, 2025 1:56pm
clickhouse-docs-zh	⬜️ Ignored (Inspect)	Visit Preview	Jul 9, 2025 1:56pm

heavycrystal · 2025-07-08T10:09:34Z

docs/integrations/data-ingestion/clickpipes/mysql/controlling_sync.md

+### Pull batch size {#pull-batch-size}
+The pull batch size is the number of records that the ClickPipe will pull from the source database in one batch. Records mean inserts, updates and deletes done on the tables that are part of the pipe.
+
+The default is **100,000** records.


Call out a safe maximum, ~10 million for now

heavycrystal · 2025-07-08T10:10:16Z

docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md

+The MySQL ClickPipe uses a column on your source table to logically partition the source tables. This column is called the **partition key column**. It is used to divide the source table into partitions, which can then be processed in parallel by the ClickPipe.
+
+:::warning
+The partition key column must be indexed in the source table to see a good performance boost.


Is it possible to validate for an index on the partition column?

heavycrystal · 2025-07-08T10:11:11Z

docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md

+<img src={snapshot_params} alt="Snapshot parameters" />
+
+#### Snapshot number of rows per partition {#snapshot-number-of-rows-per-partition}
+This setting controls how many rows constitute a partition. The ClickPipe will read the source table in chunks of this size, and each chunk will be processed in parallel. The default value is 100,000 rows per partition.


nit: remove "each" as it can give the impression of all chunks processing at once
"chunks will be processed in parallel"

heavycrystal · 2025-07-08T10:12:04Z

docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md

+### Monitoring parallel snapshot in Postgres {#monitoring-parallel-snapshot-in-postgres}
+You can analyze **pg_stat_activity** to see the parallel snapshot in action. The ClickPipe will create multiple connections to the source database, each reading a different partition of the source table. If you see **FETCH** queries with different CTID ranges, it means that the ClickPipe is reading the source tables. You can also see the COUNT(*) and the partitioning query in here.
+
+### Limitations {#limitations}


Call out compressed hypertables here mayhaps?

lot more docs

a2a2441

Amogh-Bharadwaj requested review from a team as code owners July 7, 2025 20:34

Amogh-Bharadwaj requested a review from mshustov July 7, 2025 20:34

Amogh-Bharadwaj requested review from serprex and heavycrystal July 7, 2025 20:34

vercel bot had a problem deploying to Preview – clickhouse-docs July 7, 2025 20:34 Failure

change screenshot

74d64aa

vercel bot had a problem deploying to Preview – clickhouse-docs July 7, 2025 20:36 Failure

add section on monitoring sync control

ed9882f

vercel bot had a problem deploying to Preview – clickhouse-docs July 7, 2025 20:43 Failure

add explicit anchor

06891b0

vercel bot had a problem deploying to Preview – clickhouse-docs July 8, 2025 05:24 Failure

Blargian added 2 commits July 8, 2025 07:25

add explicit anchor tags

2ab93fc

add explicit anchor

0912854

vercel bot had a problem deploying to Preview – clickhouse-docs July 8, 2025 05:26 Failure

add explicit anchor

497679b

vercel bot had a problem deploying to Preview – clickhouse-docs July 8, 2025 05:28 Failure

add explicit anchors

cddbb80

vercel bot had a problem deploying to Preview – clickhouse-docs July 8, 2025 05:29 Failure

add explicit anchors

893315f

vercel bot had a problem deploying to Preview – clickhouse-docs July 8, 2025 05:31 Failure

heavycrystal reviewed Jul 8, 2025

View reviewed changes

give page unique slug

4f8dc8f

vercel bot had a problem deploying to Preview – clickhouse-docs July 9, 2025 13:56 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DB Pipes: Docs on sync control, resync, initial load and more #4037

DB Pipes: Docs on sync control, resync, initial load and more #4037

Uh oh!

Amogh-Bharadwaj commented Jul 7, 2025

Uh oh!

vercel bot commented Jul 7, 2025 •

edited

Loading

Uh oh!

heavycrystal Jul 8, 2025

Uh oh!

heavycrystal Jul 8, 2025

Uh oh!

heavycrystal Jul 8, 2025

Uh oh!

heavycrystal Jul 8, 2025

Uh oh!

Uh oh!

DB Pipes: Docs on sync control, resync, initial load and more #4037

Are you sure you want to change the base?

DB Pipes: Docs on sync control, resync, initial load and more #4037

Uh oh!

Conversation

Amogh-Bharadwaj commented Jul 7, 2025

Summary

Checklist

Uh oh!

vercel bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heavycrystal Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

heavycrystal Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

heavycrystal Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

heavycrystal Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vercel bot commented Jul 7, 2025 •

edited

Loading