-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Consensus 2.0] Additional metrics & logs #18075
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
.subscriber_connections | ||
.with_label_values(&[peer_hostname, "inbound"]) | ||
.set(0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to test how reliable this approach is, but overall it would be nice to have metrics for the subscribers to "our" node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I think we should use separate metrics for inbound and outbound though so the metric as a whole makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting them makes sense 👍
consensus/core/src/block_manager.rs
Outdated
@@ -57,6 +58,9 @@ pub(crate) struct BlockManager { | |||
/// Keeps all the blocks that we actually miss and haven't fetched them yet. That set will basically contain all the | |||
/// keys from the `missing_ancestors` minus any keys that exist in `suspended_blocks`. | |||
missing_blocks: BTreeSet<BlockRef>, | |||
/// A vector that holds a tuple of (lowest_round, highest_round) of received blocks per authority. | |||
/// This is used for metrics reporting purposes and resets during restarts. | |||
received_block_rounds: Vec<(Round, Round)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use Vec<Option<(Round, Round)>>
here. Initializing lowest to 0 disables the field, and initializing highest to 0 can lead to metric anomaly as well.
let block = match self.try_accept_one_block(block) { | ||
TryAcceptResult::Accepted(block) => block, | ||
TryAcceptResult::Suspended(ancestors_to_fetch) => { | ||
debug!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to check in private testnet on the log volume. Maybe this can be trace instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
self.context | ||
.metrics | ||
.node_metrics | ||
.last_committed_authority_round |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to use last committed round per authority from DagState. That should report more consistent data across restarts, especially for validators that stop creating blocks.
self.context | ||
.metrics | ||
.node_metrics | ||
.committed_blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consensus_committed_messages
should report the same data and it is consistent in an epoch across restarts.
.subscriber_connections | ||
.with_label_values(&[peer_hostname, "inbound"]) | ||
.set(0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I think we should use separate metrics for inbound and outbound though so the metric as a whole makes sense.
@@ -119,6 +112,13 @@ impl<C: NetworkClient, S: NetworkService> Subscriber<C, S> { | |||
let mut retries: i64 = 0; | |||
let mut delay = INITIAL_RETRY_INTERVAL; | |||
'subscription: loop { | |||
context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good fix.
373e833
to
ff2ac36
Compare
ff2ac36
to
36252ad
Compare
36252ad
to
837f4a5
Compare
synchronizer_fetched_blocks_by_peer: register_int_counter_vec_with_registry!( | ||
"synchronizer_fetched_blocks_by_peer", | ||
"Number of fetched blocks per peer authority via the synchronizer and also by block authority", | ||
&["peer", "type"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cardinality of # peers * # authorities could be just too large (>10000). We can test in private testnet but for now I'm splitting them into two metrics.
837f4a5
to
0bfd05b
Compare
More metrics and logs related to block receive/acceptance/commit , subscription etc CI --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: --------- Co-authored-by: MW Tian <mingwei@mystenlabs.com>
## Description More metrics and logs related to block receive/acceptance/commit , subscription etc ## Test plan CI --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: --------- Co-authored-by: MW Tian <mingwei@mystenlabs.com>
Description
More metrics and logs related to block receive/acceptance/commit , subscription etc
Test plan
CI
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.