-
Notifications
You must be signed in to change notification settings - Fork 5
Sync Optimizations #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This has been removed from the protocol, and has no active implementations.
a195430
to
5b59319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some amazing performance improvements.
Will get a new release with these changes out next week. Note that for the web SDK, the changes in powersync-ja/powersync-js#266 fix the biggest performance issues, but these changes will give further improvements for buckets with many more operations than actual rows. |
{target: ...}
in MOVE operations. This has been removed from the protocol, and has no activeimplementations. See protocol docs for MOVE.
Combined, these optimizations could help to significantly speed up initial sync of compacted buckets, where a large percentage of operations are MOVE or REMOVE operations.
This will not have a significant performance impact if:
Benchmark 1 - many MOVE and REMOVE ops
Test case:
Dart native, Linux desktop
Before: 5.7s
After: 2.7s
Diagnostics app (web sdk)
Disabled dynamic schema generation.
Before: 74s for saving the data, 490s (!) for compacting
After: 7.8s
The big speedup is likely from not filling ps_oplog with many REMOVE operations. Will need further investigation to determine why it was so slow, since we may still get this kind of performance for other cases with a large number of operations.
Benchmark 2 - many PUT ops for small number of rows
Test case:
Dart native, Linux desktop
Before: 2.34s, 800KB db file, 4.6MB WAL
After: 2.07s, 4KB db file, 2.6MB WAL
This gives around 10% performance improvement, but with significantly reduced storage usage.
Diagnostics app (web sdk)
Before: 77s, 2.2MB storage
After: 37s, 4.5MB storage
It's not clear why the storage increased in this case.
Future Work
Remove superseded column
We can completely remove the
superseded
column - it is now always 0. This will require some semi-tricky migrations, so we're not doing it just yet (there may be existing data wheresuperseded = 1
).Optimize compacting
Now that superseded operations are immediately deleted, we can also optimize the compact operations (clear_remove_ops). By combining this with the
SET last_applied_op = last_op
part, we can significantly reduce the number of rows we need to scan for REMOVE operations after incremental updates. This can give us continuous auto-compacting, instead of the current "compact once every 1000 operations".Normalize bucket names
Bucket names are primarily used when saving and superseding operations. We already store each synced bucket in
ps_buckets
. We can use those ids inps_oplog
, instead of the full bucket names. This could reduce storage size and increase performance.