[MongoDB] Fix replication batching #271

rkistner · 2025-05-27T13:21:53Z

Background

When replicating from a MongoDB database, we use the _powersync_checkpoints collection for multiple purposes:

To detect the end of a transaction.
To create write checkpoints.
To dynamically "batch" updates from multiple transactions efficiently.

This change specifically concerns the last point - batching. The process works as follows:

Whenever we receive a change, and haven't started a "batch" yet, we create a new "checkpoint" document.
We wait for that document to be present in the change stream.
Once we get that document, we flush/commit the changes.

On a mostly-idle database, we should get that checkpoint document back almost immediately, triggering a flush ASAP. While on very busy databases where a replication lag builds up, we only get that document back once we've "caught up" on replication since it was created. That will increase the number of documents in the batch, increasing throughput, and allow us to catch up faster.

The issue

Now, the issue comes in when we connect multiple PowerSync databases to the same source database: They share the same _powersync_checkpoints collection. While it is not an issue under low load, we've observed an issue during initial replication:

Instance A is busy with initial replication, which adds a delay to each flush. Normally, as explained above, that would not be an issue - we'd flush less often, but still maintain high throughput.
Instance B performs normal replication at a high rate. Since it has no significant load, it flushes the changes often, creating a new checkpoint document each time.

Now the issue is that instance A receives all the checkpoint documents from instance B, causing it to attempt a flush at the same rate. It can't keep up at that rate, so it falls behind and builds up a replication lag.

The fix

The fix here is to identify where the checkpoint documents originate, and ignore checkpoint documents from other instances.

For now, we still process all checkpoint documents created for write checkpoints. These are a little more difficult to filter out, due to the checkpoint originating from a different process. If these do become an issue, we can investigate filtering and/or throttling these later.

Replication Lag Metrics

To help diagnose issues like these, a new replication lag metric is added in #272

Testing

To test the issue locally, I ran two powersync instances with the same source database.

First instance runs normally.
In the second instance, I introduced an artificial delay of 2s in MongoBucketBatch.flushInner().
Create a series of 10x small updates in the source database.

Before this change:

The first instance processes each batch of changes quickly.
The second instance also processes each batch of changes separately, but with the artificial delay. Replication lag keeps on growing (measured using Add powersync_replication_lag_seconds metric #272).

With this change:

The first instance processes each batch of changes quickly.
The second instance automatically uses larger batch sizes as the replication lag grows, keeping the replication lag around 2-4s.

changeset-bot · 2025-05-27T13:21:57Z

🦋 Changeset detected

Latest commit: 5b689f8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 11 packages

Name	Type
@powersync/service-module-mongodb	Patch
@powersync/service-core	Patch
@powersync/service-image	Patch
@powersync/service-schema	Patch
@powersync/service-core-tests	Patch
@powersync/service-module-core	Patch
@powersync/service-module-mongodb-storage	Patch
@powersync/service-module-mysql	Patch
@powersync/service-module-postgres-storage	Patch
@powersync/service-module-postgres	Patch
test-client	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copilot

Pull Request Overview

This PR fixes replication batching issues by adjusting how checkpoint documents are created and processed so that each PowerSync instance only processes its own checkpoints. Key changes include:

Removal of the redundant getReplicationHead method in the RouteAPI interface.
Updates to the createCheckpoint API across modules to include a new STANDALONE_CHECKPOINT_ID.
Improvements in the ChangeStream logic to filter out checkpoint events not originating from the current process.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
packages/service-core/src/api/RouteAPI.ts	Removed the getReplicationHead method to simplify the API.
modules/module-mongodb/test/src/change_stream_utils.ts	Updated tests to use the new createCheckpoint signature with STANDALONE_CHECKPOINT_ID.
modules/module-mongodb/src/replication/MongoRelation.ts	Modified createCheckpoint to accept an id parameter and added a new constant for standalone checkpoints.
modules/module-mongodb/src/replication/ChangeStream.ts	Updated checkpoint handling in the ChangeStream to filter native and batch checkpoints properly.
modules/module-mongodb/src/api/MongoRouteAPIAdapter.ts	Adjusted checkpoint creation to use STANDALONE_CHECKPOINT_ID.
.changeset/beige-clouds-cry.md	Documented the patch releases for affected modules.

Comments suppressed due to low confidence (2)

modules/module-mongodb/src/replication/MongoRelation.ts:154

Typo in documentation: 'immeidately' should be corrected to 'immediately'.

 * Use this for write checkpoints, or any other case where we want to process the checkpoint immeidately, and not wait for batching.

modules/module-mongodb/src/replication/ChangeStream.ts:764

The expression 'this.checkpointStreamId.equals(this.checkpointStreamId)' will always return true. It is likely intended to compare 'this.checkpointStreamId' with 'checkpointId'.

if (!(checkpointId == STANDALONE_CHECKPOINT_ID || this.checkpointStreamId.equals(this.checkpointStreamId))) {

stevensJourney

Looks good to me :)

Ignore checkpoints from different replication streams.

9370f38

rkistner changed the title ~~[MongoDB] Fix replication batching / Replication lag metrics~~ [MongoDB] Fix replication batching May 28, 2025

rkistner force-pushed the fix-replication-batching branch from e535382 to 9370f38 Compare May 28, 2025 08:25

Add changeset.

ce213f9

rkistner marked this pull request as ready for review May 28, 2025 08:46

rkistner requested review from Copilot and stevensJourney May 28, 2025 08:49

Copilot AI reviewed May 28, 2025

View reviewed changes

Fix checkpoint comparison.

ba2896a

rkistner marked this pull request as draft May 28, 2025 09:24

Use documentKey._id, not the resume token.

5b689f8

rkistner marked this pull request as ready for review May 28, 2025 09:59

stevensJourney approved these changes May 28, 2025

View reviewed changes

rkistner merged commit b57f938 into main May 28, 2025
20 checks passed

rkistner deleted the fix-replication-batching branch May 28, 2025 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MongoDB] Fix replication batching #271

[MongoDB] Fix replication batching #271

Uh oh!

rkistner commented May 27, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented May 27, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

stevensJourney left a comment

Uh oh!

Uh oh!

Uh oh!

[MongoDB] Fix replication batching #271

[MongoDB] Fix replication batching #271

Uh oh!

Conversation

rkistner commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

The issue

The fix

Replication Lag Metrics

Testing

Uh oh!

changeset-bot bot commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

stevensJourney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rkistner commented May 27, 2025 •

edited

Loading

changeset-bot bot commented May 27, 2025 •

edited

Loading