-
Notifications
You must be signed in to change notification settings - Fork 21
Add powersync_replication_lag_seconds metric #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 7ef0fbd The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces a new replication lag metric for PowerSync, tracking the delay in seconds between the source database and PowerSync instance. It adds logging enhancements to include replication lag, updates method names for clarity, and revises the sync rules storage structure by adding an “active” flag.
- Renamed API methods (e.g. getReplicationLag to getReplicationLagBytes) to clarify what the method returns.
- Updated replication lag computation in multiple modules (Postgres, MySQL, MongoDB) with new internal state handling.
- Enhanced logging messages to include replication lag data and improved sync rules tracking.
Reviewed Changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
packages/service-core/src/api/RouteAPI.ts | Renamed lag API method to reflect units (bytes). |
packages/service-core-tests/src/test-utils/general-utils.ts | Added active flag in test sync rule content. |
modules/module-postgres/src/replication/WalStreamReplicator.ts | Added getReplicationLagMillis method with fallback using last commit/keepalive timestamps. |
modules/module-postgres/src/replication/WalStream.ts | Integrated tracking of oldest uncommitted change for lag computation. |
modules/module-postgres/src/replication/WalStreamReplicationJob.ts | Propagated last stream instance for lag reporting. |
modules/module-mysql/src/replication/BinLogStream.ts & BinLogReplicator.ts | Introduced lag tracking methods and state updates in replication flows. |
modules/module-mongodb/src/replication/*.ts | Enhanced ChangeStream logic and lag tracking for MongoDB replication. |
modules/module-/src/storage/ | Extended flush/commit logic to log replication lag and added “active” sync rule states. |
Comments suppressed due to low confidence (3)
packages/service-core/src/api/RouteAPI.ts:50
- Ensure the naming of this method clearly reflects the unit it returns. Since the new metric is reported in seconds, verify that 'Bytes' is the intended descriptor, or consider renaming to avoid any ambiguity.
getReplicationLagBytes(options: ReplicationLagOptions): Promise<number | undefined>;
modules/module-postgres-storage/src/storage/batch/PostgresBucketBatch.ts:316
- Review the change in the return value when no persisted operations exist; ensure that returning true aligns with downstream logic expecting a successful commit.
return true;
modules/module-mongodb/src/replication/ChangeStream.ts:765
- [nitpick] Verify that resetting 'oldestUncommittedChange' and 'isStartingReplication' after a successful commit covers all edge cases, ensuring that lag calculations remain accurate.
const didCommit = await batch.commit(lsn, { oldestUncommittedChange: this.oldestUncommittedChange });
modules/module-mongodb-storage/src/storage/implementation/PersistedBatch.ts
Show resolved
Hide resolved
509678a
to
e3d3671
Compare
3c32bbe
to
bfeb4af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :)
Record replication lag in the logs and as a new metric, to help diagnose and alert on replication delays.
New metric:
New logs on commit:
Note that there is some difference between the logs versus the metric:
We generally use the difference between the source database timestamps and the powersync instance time to calculate the lag. If the time for one of them is out, that will cause a constant offset to the replication lag calculation. This also means the replication lag could be reported as negative.
If the active sync rules is in an error state, we report the time since it could last persist a change.
If there are no active sync rules (e.g. sync rules was deployed for the first time), the metric is not reported.
The initial implementation here used a normal gauge instead of an observable one (i.e. the value was set at specific points, rather than polled). I extended the metrics implementation to add a plain
Gauge
interface. Since then I've found that the observable gauge approach is just better, and switched to that, removing the Gauge interface again. Replication lag is better calculated when you measure it, rather than at specific events, and I think the same applies to most gauges.