RCORE-2209 Treat completing a client reset as receiving a MARK message #7921

tgoyne · 2024-07-23T19:39:14Z

Client resets which did not recovery any changes (either because changes were discarded, there was nothing to recover, or the recovered changesets became empty after merging) don't need to wait for a server round-trip to mark the reset as complete, as that round-trip merely consisted of sending a MARK to the server and waiting for a response. This partially reverts #6196 and fixes that bug by immediately removing the client reset tracker as part of the diff commit if there was nothing recovered.

Performing a client reset involves waiting for download completion and bringing the Realm file into the state it would have been in if it had completed downloading, so it should fire download completion handlers. Previously we did everything we would do on download completion except for this. Since we performed a wait for download completion after applying a client reset diff the handlers would eventually get called, but the exact timing depended on server behavior which is changing in QBSv2 (and the wait for download completion is removed by the above change).

coveralls-official · 2024-07-23T21:20:22Z

Pull Request Test Coverage Report for Build thomas.goyne_494

Details

428 of 471 (90.87%) changed or added relevant lines in 17 files are covered.
117 unchanged lines in 22 files lost coverage.
Overall coverage decreased (-0.02%) to 91.106%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/realm/sync/noinst/client_history_impl.cpp	7	8	87.5%
test/object-store/util/sync/sync_test_utils.cpp	37	39	94.87%
src/realm/sync/noinst/pending_reset_store.cpp	73	89	82.02%
test/object-store/sync/flx_sync.cpp	178	202	88.12%

Files with Coverage Reduction	New Missed Lines	%
src/realm/array_backlink.cpp	1	91.38%
src/realm/dictionary.cpp	1	85.16%
src/realm/query_engine.hpp	1	93.94%
src/realm/sync/network/websocket.cpp	1	72.43%
src/realm/sync/noinst/client_impl_base.cpp	1	83.34%
src/realm/util/serializer.cpp	1	90.43%
src/realm/uuid.cpp	1	98.48%
test/test_dictionary.cpp	1	99.83%
test/test_query2.cpp	1	98.73%
src/realm/db.cpp	2	92.63%

Totals
Change from base Build 2555:	-0.02%
Covered Lines:	217435
Relevant Lines:	238662

💛 - Coveralls

tgoyne · 2024-07-31T20:40:32Z

src/realm/sync/noinst/client_impl_base.cpp

+    m_sending_session = sess;
+    m_sending = true;


These need to be set before calling async_write_binary() to support the completion handler being called synchronously.

this is a good catch, but i can't think of any issues it could have caused - was it causing some failures in your testing or something?

The newly added test socket provider crashes without this change because it synchronously calls the completion handler.

tgoyne · 2024-07-31T20:43:47Z

src/realm/sync/noinst/migration_store.cpp

@@ -53,14 +53,14 @@ bool MigrationStore::load_data(bool read_only)

    auto tr = m_db->start_read();
    // Start with a reader so it doesn't try to write until we are ready
-    SyncMetadataSchemaVersionsReader schema_versions_reader(tr);
+    SyncMetadataSchemaVersionsReader schema_versions_reader(*tr);


All of the changes to migration store, pending bootstrap store, sync metadata schema, pending bootstrap store, and subscriptions are just secondary effects of making PendingResetStore::has_pending_reset() take a Group rather than a TransactionRef.

aside from passing around a ref vs const ref to a shared_ptr, is there a reason why this changed from a Transaction to a Group?

Aside from passing around a const ref instead of a const ref to a shared_ptr, is there a reason the parameter is a Group now and not a Transaction?

aside from passing around a ref vs ptr, is there a reason why this changed from a Transaction to a Group?

The root change is enabling PendingResetStore::has_pending_reset(realm->read_group()), which previously didn't work because the function expected a Transaction even though it didn't do anything which required a Transaction. All of these functions should have been taking a Group the whole time as they don't change the transaction state.

tgoyne · 2024-07-31T20:45:40Z

test/object-store/util/sync/sync_test_utils.cpp

@@ -308,6 +308,9 @@ StatusWith<std::shared_ptr<Realm>> async_open_realm(const Realm::Config& config)
 std::shared_ptr<Realm> successfully_async_open_realm(const Realm::Config& config)
 {
    auto status = async_open_realm(config);
+    if (!status.is_ok()) {


Unexpected errors here previously didn't log the error so it was annoying to debug.

tgoyne · 2024-07-31T20:47:33Z

test/object-store/sync/flx_sync.cpp

@@ -1196,6 +1269,7 @@ TEST_CASE("flx: client reset", "[sync][flx][client reset][baas]") {
        auto subs = realm->get_latest_subscription_set();
        auto result = subs.get_state_change_notification(sync::SubscriptionSet::State::Complete).get();
        CHECK(result == sync::SubscriptionSet::State::Complete);
+        SyncSession::OnlyForTesting::pause_async(*realm->sync_session()).get();


If the server was sufficiently fast it could theoretically send the client reset we trigger below to the current sync session, which isn't what these tests want. Probably never actually happened in practice.

FYI - in case you weren't aware, the client reset triggered by the command below just invalidates the file ident and the reset is not initiated until the session reconnects.

tgoyne · 2024-07-31T23:51:17Z

src/realm/sync/noinst/sync_metadata_schema.cpp

-    else
-        tr->commit_and_continue_writing();


I couldn't find anywhere that this was being used where this commit would do anything useful and it made the client reset tests which verified that exactly two commits were made more complicated.

I'd also assume this is fine; if the caller had opened a write transaction, they will also commit it at some point.

tgoyne · 2024-07-31T23:52:05Z

src/realm/sync/noinst/pending_reset_store.cpp

 {
-    // Write transaction required
-    REALM_ASSERT(wr_tr->get_transact_stage() == DB::TransactStage::transact_Writing);
-    auto reset_store = PendingResetStore::load_or_create_schema(wr_tr);


Loading the schema here was kinda slow and it wasn't actually being used for anything.

ironage

Great improvements! LGTM.

ironage · 2024-08-02T17:00:44Z

CHANGELOG.md

@@ -6,6 +6,8 @@
 ### Fixed
 * <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?)
 * Sync client may report duplicate compensating write errors ([#7708](https://github.com/realm/realm-core/issues/7708), since v14.8.0).


this fix was released yesterday so the following additions will have to be updated to go to the new section

ironage · 2024-08-02T17:44:53Z

src/realm/sync/noinst/sync_metadata_schema.cpp

-    else
-        tr->commit_and_continue_writing();


I'd also assume this is fine; if the caller had opened a write transaction, they will also commit it at some point.

ironage · 2024-08-02T17:46:12Z

src/realm/sync/noinst/sync_metadata_schema.cpp

 {
    std::vector<SyncMetadataTable> unified_schema_version_table_def{
        {&m_table,
         c_sync_internal_schemas_table,
         {&m_schema_group_field, c_meta_schema_schema_group_field, type_String},
         {{&m_version_field, c_meta_schema_version_field, type_Int}}}};

-    // Any type of transaction is allowed, including frozen and write, as long as it supports reading
-    REALM_ASSERT_EX(tr->get_transact_stage() != DB::transact_Ready, tr->get_transact_stage());


Using a group everywhere makes the intent much more clear. 💯

ironage · 2024-08-02T18:07:29Z

test/object-store/util/sync/sync_test_utils.cpp

+    return this;
+}
+
+TestClientReset* TestClientReset::expect_reset_error(std::optional<SyncError>& err)


nice simplification 👍

jbreams · 2024-08-05T15:47:13Z

src/realm/sync/noinst/client_impl_base.cpp

+    m_sending_session = sess;
+    m_sending = true;


this is a good catch, but i can't think of any issues it could have caused - was it causing some failures in your testing or something?

jbreams · 2024-08-05T17:18:02Z

src/realm/sync/noinst/migration_store.cpp

@@ -53,14 +53,14 @@ bool MigrationStore::load_data(bool read_only)

    auto tr = m_db->start_read();
    // Start with a reader so it doesn't try to write until we are ready
-    SyncMetadataSchemaVersionsReader schema_versions_reader(tr);
+    SyncMetadataSchemaVersionsReader schema_versions_reader(*tr);


aside from passing around a ref vs const ref to a shared_ptr, is there a reason why this changed from a Transaction to a Group?

jbreams · 2024-08-05T17:18:02Z

src/realm/sync/noinst/migration_store.cpp

@@ -53,14 +53,14 @@ bool MigrationStore::load_data(bool read_only)

    auto tr = m_db->start_read();
    // Start with a reader so it doesn't try to write until we are ready
-    SyncMetadataSchemaVersionsReader schema_versions_reader(tr);
+    SyncMetadataSchemaVersionsReader schema_versions_reader(*tr);


Aside from passing around a const ref instead of a const ref to a shared_ptr, is there a reason the parameter is a Group now and not a Transaction?

jbreams · 2024-08-05T17:18:02Z

src/realm/sync/noinst/migration_store.cpp

@@ -53,14 +53,14 @@ bool MigrationStore::load_data(bool read_only)

    auto tr = m_db->start_read();
    // Start with a reader so it doesn't try to write until we are ready
-    SyncMetadataSchemaVersionsReader schema_versions_reader(tr);
+    SyncMetadataSchemaVersionsReader schema_versions_reader(*tr);


aside from passing around a ref vs ptr, is there a reason why this changed from a Transaction to a Group?

jbreams · 2024-08-05T17:29:08Z

test/object-store/sync/flx_sync.cpp

+// A socket provider which claims to always work, but when `disconnect = true`
+// will actually drop all incoming and outgoing messages. This enables testing
+// going offline at very specfic points.
+struct DisconnectingSocketProvider : sync::websocket::DefaultSocketProvider {


more of a DiscardAllTrafficSocketProvider or something rather than a DisconnectingSocketProvider since it never actually disconnects you. Is there anything in this PR that would behave differently if there was an actual disconnect where we reset all the protocol state rather than just discarding messages?

I don't love the name, but it's "disconnecting" in the sense of disconnecting a network cable between you and the server.

Resetting the protocol state would require tying the tests to implementation details of sync connections, while this approach lets us test it via the public API.

more like disconnecting in the sense that an intermediate hop is dropping packets, but yeah. by "reset all the protocol state" i just meant calling websocket_closed_handler() to signal to the sync client that the connection has been closed. i think without some reworking of how the sync client handles closed connections this could be kinda tough though. I think the answer to my question is that there aren't any changes the depend on actually disconnecting the session since client_reset_if_needed() doesn't depend on having any of the Session's previous state be correct.

For these tests I do just want to be absolutely sure that no synchronization has happened after the download of the fresh realm has completed (until it's time to allow sync to happen again) and the exact details are unimportant.

I've reworked this type to call websocket_closed_handler() (or defer the call to DefaultSocketProvider::connect()) as it turned out that dropping packets really didn't work outside of simple cases.

jbreams · 2024-08-05T17:30:29Z

src/realm/sync/noinst/client_history_impl.cpp

@@ -947,6 +948,19 @@ void ClientHistory::update_sync_progress(const SyncProgress& progress, Downloada
    root.set(s_progress_uploaded_bytes_iip,
             RefOrTagged::make_tagged(uploaded_bytes)); // Throws

+    if (previous_upload_client_version < progress.upload.client_version) {


does this assume that if we make any upload progress that we'll have fully uploaded all changes and we know for sure we aren't going to get another client reset from any recovered changesets? maybe now that we have compensating writes that doesn't really matter as much?

Uploading changesets should either result in the server acknowledging the upload or sending a client reset and not both, so once our UPLOAD is acked the window for getting a client reset due to those changesets being invalid has ended.

I think this check is probably wrong and it needs to actually be checking if we've reached the client version at the time of the client reset (or at time of opening).

It took a while to figure out how to test it but this is indeed incorrect; it marks the client reset as complete as soon as any changesets are acked rather than when all of the recovered changesets are.

src/realm/sync/noinst/client_impl_base.cpp

tgoyne · 2024-08-09T16:17:05Z

I've updated this to track which client version was the last one recovered by a client reset and mark the client reset as complete once that version has been uploaded (and acked). This is sort of a fake bug fix relative to the behavior in practice with PBS and QBSv1. There's some extreme edge cases that now work better (e.g. recovered changesets are successfully uploaded, then device goes offline before receiving the MARK and stays offline until the client file ident expires on the server), but the primary benefit is that the behavior of an async open that triggers a client reset no longer depends on how the server handles a MARK sent while uploading changesets. With PBS/QBSv1 the MARK doesn't wait for the server to have processed those UPLOADs, and with QBSv2 it (sometimes?) does. By not relying on MARK for marking client resets complete, we preserve the existing visible behavior for client resets.

…ploaded

tgoyne self-assigned this Jul 23, 2024

cla-bot bot added the cla: yes label Jul 23, 2024

tgoyne mentioned this pull request Jul 23, 2024

RCORE-2209: Loosen illegal additive schema change test assertions #7914

Closed

4 tasks

tgoyne force-pushed the tg/download-completion-on-client-reset branch 4 times, most recently from 4cc2831 to a4cbf2f Compare July 31, 2024 23:47

tgoyne commented Jul 31, 2024

View reviewed changes

tgoyne force-pushed the tg/download-completion-on-client-reset branch 2 times, most recently from 9204cfc to 65330eb Compare August 1, 2024 17:44

tgoyne marked this pull request as ready for review August 2, 2024 02:47

tgoyne requested review from jbreams and ironage August 2, 2024 02:47

ironage approved these changes Aug 2, 2024

View reviewed changes

tgoyne force-pushed the tg/download-completion-on-client-reset branch 3 times, most recently from 329c1c6 to 7e20b9f Compare August 5, 2024 16:27

jbreams reviewed Aug 5, 2024

View reviewed changes

michael-wb reviewed Aug 7, 2024

View reviewed changes

src/realm/sync/noinst/client_impl_base.cpp Show resolved Hide resolved

tgoyne mentioned this pull request Aug 7, 2024

RCORE-2232 Actually check for unuploaded changes in no_pending_local_changes() #7967

Merged

tgoyne force-pushed the tg/download-completion-on-client-reset branch 2 times, most recently from bf89359 to c78df18 Compare August 9, 2024 03:58

tgoyne changed the base branch from master to tg/unuploaded-changesets August 9, 2024 03:59

tgoyne force-pushed the tg/download-completion-on-client-reset branch from c78df18 to edc1174 Compare August 9, 2024 16:05

tgoyne force-pushed the tg/unuploaded-changesets branch from 9e43725 to 9252ccc Compare August 9, 2024 17:56

Base automatically changed from tg/unuploaded-changesets to master August 9, 2024 18:49

tgoyne added 2 commits August 9, 2024 11:49

Treat completing a client reset as receiving a MARK message

bb32aef

Do more precise checking of when all recovered changesets have been u…

80604f8

…ploaded

tgoyne force-pushed the tg/download-completion-on-client-reset branch 2 times, most recently from 0857d0e to 80604f8 Compare August 9, 2024 19:31

tgoyne closed this Oct 16, 2024

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RCORE-2209 Treat completing a client reset as receiving a MARK message #7921

RCORE-2209 Treat completing a client reset as receiving a MARK message #7921

tgoyne commented Jul 23, 2024 •

edited

Loading

coveralls-official bot commented Jul 23, 2024 •

edited

Loading

tgoyne Jul 31, 2024

jbreams Aug 5, 2024

tgoyne Aug 5, 2024

tgoyne Jul 31, 2024

jbreams Aug 5, 2024

jbreams Aug 5, 2024

jbreams Aug 5, 2024

tgoyne Aug 5, 2024

tgoyne Jul 31, 2024

tgoyne Jul 31, 2024

michael-wb Aug 7, 2024

tgoyne Jul 31, 2024

ironage Aug 2, 2024

tgoyne Jul 31, 2024

ironage left a comment

ironage Aug 2, 2024

ironage Aug 2, 2024

ironage Aug 2, 2024

ironage Aug 2, 2024

jbreams Aug 5, 2024

jbreams Aug 5, 2024

jbreams Aug 5, 2024

jbreams Aug 5, 2024

jbreams Aug 5, 2024

tgoyne Aug 5, 2024

jbreams Aug 7, 2024

tgoyne Aug 7, 2024

tgoyne Aug 9, 2024

jbreams Aug 5, 2024

tgoyne Aug 5, 2024

tgoyne Aug 7, 2024

tgoyne commented Aug 9, 2024

RCORE-2209 Treat completing a client reset as receiving a MARK message #7921

RCORE-2209 Treat completing a client reset as receiving a MARK message #7921

Conversation

tgoyne commented Jul 23, 2024 • edited Loading

coveralls-official bot commented Jul 23, 2024 • edited Loading

Pull Request Test Coverage Report for Build thomas.goyne_494

Details

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ironage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgoyne commented Aug 9, 2024

tgoyne commented Jul 23, 2024 •

edited

Loading

coveralls-official bot commented Jul 23, 2024 •

edited

Loading