With replication enabled, RecoveryManager spins uselessly on primary instead of waiting for events. #1537

lmwnshn · 2021-04-10T21:35:20Z

Feature Request

Summary

With replication enabled, RecoveryManager spins uselessly on primary instead of waiting for events.

Solution

Look at RecoveryManager::RunTask() and get rid of the while loop hack.
ReplicationLogProvider::WaitForEvent() was added much later and might be helpful here if you can modify it accordingly.
And/or, look at AbstractLogProvider.

In general, the RecoveryManager needs to be able to distinguish between "this log provider will never get more logs" (wal.log) and "this log provider may get more logs" (replication).

The text was updated successfully, but these errors were encountered:

jkosh44 · 2021-06-19T17:18:00Z

Why does the primary node even need an instance of the recovery manager? Could we just check in DB main if it's the primary instance and if so not start the recovery manager?

jkosh44 · 2021-06-19T17:20:29Z

We could change

      if (use_replication_) {
        auto log_provider = replication_manager->IsPrimary()
                                ? nullptr
                                : replication_manager->GetAsReplica()
                                      ->GetReplicationLogProvider()
                                      .CastManagedPointerTo<storage::AbstractLogProvider>();
        recovery_manager = std::make_unique<storage::RecoveryManager>(
            log_provider, catalog_layer->GetCatalog(), txn_layer->GetTransactionManager(),
            txn_layer->GetDeferredActionManager(), common::ManagedPointer(replication_manager),
            common::ManagedPointer(thread_registry), common::ManagedPointer(storage_layer->GetBlockStore()));
        recovery_manager->StartRecovery();
      }

to

      if (use_replication_ && replication_manager->IsReplica()) {
        auto log_provider = replication_manager->GetAsReplica()
                                      ->GetReplicationLogProvider()
                                      .CastManagedPointerTo<storage::AbstractLogProvider>();
        recovery_manager = std::make_unique<storage::RecoveryManager>(
            log_provider, catalog_layer->GetCatalog(), txn_layer->GetTransactionManager(),
            txn_layer->GetDeferredActionManager(), common::ManagedPointer(replication_manager),
            common::ManagedPointer(thread_registry), common::ManagedPointer(storage_layer->GetBlockStore()));
        recovery_manager->StartRecovery();
      }

lmwnshn added the performance Performance related issues or changes. label Apr 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

With replication enabled, RecoveryManager spins uselessly on primary instead of waiting for events. #1537

With replication enabled, RecoveryManager spins uselessly on primary instead of waiting for events. #1537

lmwnshn commented Apr 10, 2021

jkosh44 commented Jun 19, 2021

jkosh44 commented Jun 19, 2021

With replication enabled, RecoveryManager spins uselessly on primary instead of waiting for events. #1537

With replication enabled, RecoveryManager spins uselessly on primary instead of waiting for events. #1537

Comments

lmwnshn commented Apr 10, 2021

Feature Request

Summary

Solution

jkosh44 commented Jun 19, 2021

jkosh44 commented Jun 19, 2021