-
Notifications
You must be signed in to change notification settings - Fork 21
[MongoDB] Fix replication batching #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 5b689f8 The changes in this PR will be included in the next version bump. This PR includes changesets to release 11 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
e535382
to
9370f38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes replication batching issues by adjusting how checkpoint documents are created and processed so that each PowerSync instance only processes its own checkpoints. Key changes include:
- Removal of the redundant getReplicationHead method in the RouteAPI interface.
- Updates to the createCheckpoint API across modules to include a new STANDALONE_CHECKPOINT_ID.
- Improvements in the ChangeStream logic to filter out checkpoint events not originating from the current process.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
packages/service-core/src/api/RouteAPI.ts | Removed the getReplicationHead method to simplify the API. |
modules/module-mongodb/test/src/change_stream_utils.ts | Updated tests to use the new createCheckpoint signature with STANDALONE_CHECKPOINT_ID. |
modules/module-mongodb/src/replication/MongoRelation.ts | Modified createCheckpoint to accept an id parameter and added a new constant for standalone checkpoints. |
modules/module-mongodb/src/replication/ChangeStream.ts | Updated checkpoint handling in the ChangeStream to filter native and batch checkpoints properly. |
modules/module-mongodb/src/api/MongoRouteAPIAdapter.ts | Adjusted checkpoint creation to use STANDALONE_CHECKPOINT_ID. |
.changeset/beige-clouds-cry.md | Documented the patch releases for affected modules. |
Comments suppressed due to low confidence (2)
modules/module-mongodb/src/replication/MongoRelation.ts:154
- Typo in documentation: 'immeidately' should be corrected to 'immediately'.
* Use this for write checkpoints, or any other case where we want to process the checkpoint immeidately, and not wait for batching.
modules/module-mongodb/src/replication/ChangeStream.ts:764
- The expression 'this.checkpointStreamId.equals(this.checkpointStreamId)' will always return true. It is likely intended to compare 'this.checkpointStreamId' with 'checkpointId'.
if (!(checkpointId == STANDALONE_CHECKPOINT_ID || this.checkpointStreamId.equals(this.checkpointStreamId))) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :)
Background
When replicating from a MongoDB database, we use the
_powersync_checkpoints
collection for multiple purposes:This change specifically concerns the last point - batching. The process works as follows:
On a mostly-idle database, we should get that checkpoint document back almost immediately, triggering a flush ASAP. While on very busy databases where a replication lag builds up, we only get that document back once we've "caught up" on replication since it was created. That will increase the number of documents in the batch, increasing throughput, and allow us to catch up faster.
The issue
Now, the issue comes in when we connect multiple PowerSync databases to the same source database: They share the same
_powersync_checkpoints
collection. While it is not an issue under low load, we've observed an issue during initial replication:Now the issue is that instance A receives all the checkpoint documents from instance B, causing it to attempt a flush at the same rate. It can't keep up at that rate, so it falls behind and builds up a replication lag.
The fix
The fix here is to identify where the checkpoint documents originate, and ignore checkpoint documents from other instances.
For now, we still process all checkpoint documents created for write checkpoints. These are a little more difficult to filter out, due to the checkpoint originating from a different process. If these do become an issue, we can investigate filtering and/or throttling these later.
Replication Lag Metrics
To help diagnose issues like these, a new replication lag metric is added in #272
Testing
To test the issue locally, I ran two powersync instances with the same source database.
MongoBucketBatch.flushInner()
.Before this change:
With this change: