Skip to content

storage: avoid indexing batches in Replica.applySnapshot #10783

@petermattis

Description

@petermattis

Most (all?) excessive raftMu lock hold times stem from replica deletion and applying a snapshot to a replica. Instrumenting Replica.applySnapshot shows the following breakdown of times when applying a 56 MB snapshot:

clear old range data 0 ms
apply batches 274 ms
write raft log entries 0 ms
commit 366 ms
other 1 ms
total 641 ms

Another instance when applying a 54 MB snapshot showed:

clear old range data 0 ms
apply batches 646 ms
write raft log entries 0 ms
commit 222 ms
other 4 ms
total 872 ms

Notice that apply batches is taking a significant amount of time in both these snapshot applications. This is interesting because this operation is primarily sending a batch repr from Go to C++. Unfortunately, in addition to moving the data from Go to C++ that operation is indexing the batch on the C++ side so that we can later perform a read to retrieve the replica state (via a call to loadState).

So we index all of the batch data in memory in order to retrieve a handful of keys. It seems feasible to send the replica state explicitly in the snapshot in order to eliminate this read. If we can avoid reading from the C++ batch we could add additional mechanism to avoid indexing the batches cutting a significant chunk of time from applySnapshot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions