-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Most (all?) excessive raftMu lock hold times stem from replica deletion and applying a snapshot to a replica. Instrumenting Replica.applySnapshot shows the following breakdown of times when applying a 56 MB snapshot:
| clear old range data | 0 ms |
| apply batches | 274 ms |
| write raft log entries | 0 ms |
| commit | 366 ms |
| other | 1 ms |
| total | 641 ms |
Another instance when applying a 54 MB snapshot showed:
| clear old range data | 0 ms |
| apply batches | 646 ms |
| write raft log entries | 0 ms |
| commit | 222 ms |
| other | 4 ms |
| total | 872 ms |
Notice that apply batches is taking a significant amount of time in both these snapshot applications. This is interesting because this operation is primarily sending a batch repr from Go to C++. Unfortunately, in addition to moving the data from Go to C++ that operation is indexing the batch on the C++ side so that we can later perform a read to retrieve the replica state (via a call to loadState).
So we index all of the batch data in memory in order to retrieve a handful of keys. It seems feasible to send the replica state explicitly in the snapshot in order to eliminate this read. If we can avoid reading from the C++ batch we could add additional mechanism to avoid indexing the batches cutting a significant chunk of time from applySnapshot.