Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

With replication enabled, RecoveryManager spins uselessly on primary instead of waiting for events. #1537

Open
lmwnshn opened this issue Apr 10, 2021 · 2 comments
Labels
performance Performance related issues or changes.

Comments

@lmwnshn
Copy link
Contributor

lmwnshn commented Apr 10, 2021

Feature Request

Summary

With replication enabled, RecoveryManager spins uselessly on primary instead of waiting for events.

Solution

Look at RecoveryManager::RunTask() and get rid of the while loop hack.
ReplicationLogProvider::WaitForEvent() was added much later and might be helpful here if you can modify it accordingly.
And/or, look at AbstractLogProvider.

In general, the RecoveryManager needs to be able to distinguish between "this log provider will never get more logs" (wal.log) and "this log provider may get more logs" (replication).

@lmwnshn lmwnshn added the performance Performance related issues or changes. label Apr 10, 2021
@jkosh44
Copy link
Contributor

jkosh44 commented Jun 19, 2021

Why does the primary node even need an instance of the recovery manager? Could we just check in DB main if it's the primary instance and if so not start the recovery manager?

@jkosh44
Copy link
Contributor

jkosh44 commented Jun 19, 2021

We could change

      if (use_replication_) {
        auto log_provider = replication_manager->IsPrimary()
                                ? nullptr
                                : replication_manager->GetAsReplica()
                                      ->GetReplicationLogProvider()
                                      .CastManagedPointerTo<storage::AbstractLogProvider>();
        recovery_manager = std::make_unique<storage::RecoveryManager>(
            log_provider, catalog_layer->GetCatalog(), txn_layer->GetTransactionManager(),
            txn_layer->GetDeferredActionManager(), common::ManagedPointer(replication_manager),
            common::ManagedPointer(thread_registry), common::ManagedPointer(storage_layer->GetBlockStore()));
        recovery_manager->StartRecovery();
      }

to

      if (use_replication_ && replication_manager->IsReplica()) {
        auto log_provider = replication_manager->GetAsReplica()
                                      ->GetReplicationLogProvider()
                                      .CastManagedPointerTo<storage::AbstractLogProvider>();
        recovery_manager = std::make_unique<storage::RecoveryManager>(
            log_provider, catalog_layer->GetCatalog(), txn_layer->GetTransactionManager(),
            txn_layer->GetDeferredActionManager(), common::ManagedPointer(replication_manager),
            common::ManagedPointer(thread_registry), common::ManagedPointer(storage_layer->GetBlockStore()));
        recovery_manager->StartRecovery();
      }

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Performance related issues or changes.
Projects
None yet
Development

No branches or pull requests

2 participants