Sync Optimizations #19

rkistner · 2024-08-12T09:36:31Z

Remove handling of {target: ...} in MOVE operations. This has been removed from the protocol, and has no active
implementations. See protocol docs for MOVE.
Never persist MOVE operations - we only care about the checksum.
Do not persist REMOVE operations if any of these hold:
1. This is an initial sync for the bucket (including adding new buckets). However, these do still need to supersede previous PUT operations.
2. There was no previous operation supserseded (REMOVE for a PUT operation we don't have).
3. More generally, if the only superseded operations were not applied locally yet, meaning there is nothing to remove locally.
Fix a crash due to wrapping checksums (integer overflow) in debug builds. Release builds did not have this issue.
When receiving a new operation for a row, instead of marking the previous operation as superseded, delete it.

Combined, these optimizations could help to significantly speed up initial sync of compacted buckets, where a large percentage of operations are MOVE or REMOVE operations.

This will not have a significant performance impact if:

The bucket is not compacted at all (meaning no MOVE operations).
The bucket is fully defragmented (meaning only PUT operations).

Benchmark 1 - many MOVE and REMOVE ops

Test case:

Local powersync-service
122,422 total operations (30MB downloaded), with:
- 2,422 PUT operations (13MB of data)
- 60,000 MOVE operations
- 60,000 REMOVE operations

Dart native, Linux desktop

Before: 5.7s
After: 2.7s

Diagnostics app (web sdk)

Disabled dynamic schema generation.

Before: 74s for saving the data, 490s (!) for compacting
After: 7.8s

The big speedup is likely from not filling ps_oplog with many REMOVE operations. Will need further investigation to determine why it was so slow, since we may still get this kind of performance for other cases with a large number of operations.

Benchmark 2 - many PUT ops for small number of rows

Test case:

Local powersync-service
60k total operations (21MB downloaded), over 20 rows

Dart native, Linux desktop

Before: 2.34s, 800KB db file, 4.6MB WAL
After: 2.07s, 4KB db file, 2.6MB WAL

This gives around 10% performance improvement, but with significantly reduced storage usage.

Diagnostics app (web sdk)

Before: 77s, 2.2MB storage
After: 37s, 4.5MB storage

It's not clear why the storage increased in this case.

Future Work

Remove superseded column

We can completely remove the superseded column - it is now always 0. This will require some semi-tricky migrations, so we're not doing it just yet (there may be existing data where superseded = 1).

Optimize compacting

Now that superseded operations are immediately deleted, we can also optimize the compact operations (clear_remove_ops). By combining this with the SET last_applied_op = last_op part, we can significantly reduce the number of rows we need to scan for REMOVE operations after incremental updates. This can give us continuous auto-compacting, instead of the current "compact once every 1000 operations".

Normalize bucket names

Bucket names are primarily used when saving and superseding operations. We already store each synced bucket in ps_buckets. We can use those ids in ps_oplog, instead of the full bucket names. This could reduce storage size and increase performance.

This has been removed from the protocol, and has no active implementations.

crates/core/src/checkpoint.rs

crates/core/src/operations.rs

stevensJourney

These are some amazing performance improvements.

rkistner · 2024-08-14T14:38:24Z

Will get a new release with these changes out next week.

Note that for the web SDK, the changes in powersync-ja/powersync-js#266 fix the biggest performance issues, but these changes will give further improvements for buckets with many more operations than actual rows.

rkistner added 4 commits August 10, 2024 12:04

Remove "target" processing for MOVE operations.

a6c6e62

This has been removed from the protocol, and has no active implementations.

Avoid writing MOVE operations.

d857d07

Optimize REMOVE operations in initial sync.

f10c822

Safely wrap checksums for debug mode.

e03f440

rkistner mentioned this pull request Aug 12, 2024

Update tests covering powersync-sqlite-core powersync-ja/powersync.dart#141

Merged

rkistner commented Aug 12, 2024

View reviewed changes

crates/core/src/checkpoint.rs Show resolved Hide resolved

rkistner commented Aug 12, 2024

View reviewed changes

crates/core/src/operations.rs Show resolved Hide resolved

Skip REMOVE operation in additional cases.

5b59319

rkistner force-pushed the protocol-cleanup branch from a195430 to 5b59319 Compare August 12, 2024 15:08

Make the REMOVE optimization more general.

b8103bf

rkistner requested a review from stevensJourney August 13, 2024 08:51

rkistner marked this pull request as ready for review August 13, 2024 08:51

stevensJourney approved these changes Aug 13, 2024

View reviewed changes

rkistner merged commit 1504864 into main Aug 13, 2024
11 checks passed

rkistner deleted the protocol-cleanup branch August 13, 2024 13:56

rkistner self-assigned this Aug 14, 2024

This was referenced Aug 20, 2024

v0.1.9 #20

Closed

v0.2.0 #22

Merged

Release Checklist v0.2.0 #23

Closed

rkistner mentioned this pull request Nov 7, 2024

Revert optimization breaking deletes #45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync Optimizations #19

Sync Optimizations #19

Uh oh!

rkistner commented Aug 12, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

stevensJourney left a comment

Uh oh!

Uh oh!

rkistner commented Aug 14, 2024

Uh oh!

Uh oh!

Sync Optimizations #19

Sync Optimizations #19

Uh oh!

Conversation

rkistner commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark 1 - many MOVE and REMOVE ops

Dart native, Linux desktop

Diagnostics app (web sdk)

Benchmark 2 - many PUT ops for small number of rows

Dart native, Linux desktop

Diagnostics app (web sdk)

Future Work

Remove superseded column

Optimize compacting

Normalize bucket names

Uh oh!

Uh oh!

Uh oh!

stevensJourney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rkistner commented Aug 14, 2024

Uh oh!

Uh oh!

rkistner commented Aug 12, 2024 •

edited

Loading