Skip to content

Commit

Permalink
[RFC] Simplify checkpoint executor (MystenLabs#7646)
Browse files Browse the repository at this point in the history
This PR does a few simplifications to the checkpoint executor:
1. We don't need `cold_start` as a member field of the executor. The
logic of `run_epoch` should be always the same regardless of how/when we
run it because it's always focusing on the current epoch specified by
the epoch_store. Specifically, we always enter the schedule loop, and if
we have executed the last checkpoint of the epoch, we terminate the loop
and return the next epoch committee.
2. We don't need `highest_scheduled_seq_num` as a member field.
Sepcifically, `highest_scheduled_seq_num` can be updated each time we
finish scheduling a series of checkpoints, and passed around as a
variable. There is no need for a global field member.
3. We don't need `latest_synced_checkpoint`. This is part of a larger
simplification: a lot of the exiting code complexity comes from the
desire to optimize db reads to get the latest synced checkpoint. The
overhead of one read per checkpoint is negligible at the grand scheme.
We don't need to rely on the channel between state_sync and
checkpoint_executor to get the latest synced checkpoint. Instead we
could use the channel purely as a notification channel, and when we wake
up, we read from the db to get the latest synced checkpoint. Now, if we
really really want to optimize this, a cleaner way would be that we read
every synced checkpoint into memory, and schedule them as we go.

Let me know if this simplification makes sense.
  • Loading branch information
lxfind authored Jan 25, 2023
1 parent 5f46e5a commit d515c54
Show file tree
Hide file tree
Showing 4 changed files with 139 additions and 275 deletions.
17 changes: 5 additions & 12 deletions crates/sui-core/src/checkpoints/checkpoint_executor/metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ use std::sync::Arc;
pub struct CheckpointExecutorMetrics {
pub last_executed_checkpoint: IntGauge,
pub checkpoint_exec_errors: IntCounter,
pub checkpoint_exec_recv_channel_overflow: IntCounter,
pub current_local_epoch: IntGauge,
pub checkpoint_exec_epoch: IntGauge,
pub checkpoint_transaction_count: Histogram,
}

Expand All @@ -31,23 +30,17 @@ impl CheckpointExecutorMetrics {
registry
)
.unwrap(),
checkpoint_exec_recv_channel_overflow: register_int_counter_with_registry!(
"checkpoint_exec_recv_channel_overflow",
"Count of the number of times the recv channel from StateSync to CheckpointExecutor has been overflowed",
registry
)
.unwrap(),
current_local_epoch: register_int_gauge_with_registry!(
checkpoint_exec_epoch: register_int_gauge_with_registry!(
"current_local_epoch",
"Current local epoch sequence number",
"Current epoch number in the checkpoint executor",
registry
)
.unwrap(),
checkpoint_transaction_count: Histogram::new_in_registry(
"checkpoint_transaction_count",
"Number of transactions in the checkpoint",
registry
)
registry,
),
};
Arc::new(this)
}
Expand Down
Loading

0 comments on commit d515c54

Please sign in to comment.