Skip to content

tx/metrics: Add metrics for the RPC v2 transactionWatch_v1_submitAndWatch #8345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Apr 25, 2025

This PR adds metrics for the following RPC subscription: transactionWatch_v1_submitAndWatch

Metrics are exposed in two ways:

  • simple counters of how many events we've seen globally
  • a histogram vector of execution times, which is labeled by initial event -> final event
    • This helps us identify how long it takes the transaction pool to advance the state of the events, and further debug issues

Part of: #8336

(outdated) PoC Dashboards

Screenshot 2025-04-28 at 17 50 48

Next steps

  • initial dashboards with a live node
  • adjust testing

lexnv added 12 commits April 25, 2025 18:52
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv requested a review from a team April 25, 2025 15:56
@lexnv lexnv self-assigned this Apr 25, 2025
@michalkucharczyk
Copy link
Contributor

Just quickly looked into this.

How to compute the latency between submit -> finalized?

This would require us to know what was transcation lifecycle? (e.g. submitted -> inblock -> retracted -> inblock -> retracted -> inblock -> finalized.

We have all the intermediate steps, but don't have a way to compute submitted -> finalized?

But I am be missing something at the moment, need to have a second look.

@lexnv
Copy link
Contributor Author

lexnv commented Apr 28, 2025

This would require us to know what was transcation lifecycle? (e.g. submitted -> inblock -> retracted -> inblock -> retracted -> inblock -> finalized.
We have all the intermediate steps, but don't have a way to compute submitted -> finalized?

Yep exactly, and since the submitted -> finalized is an important metric, I'll add it as well thanks for the suggestion 🙏 Believe it would be difficult otherwise to create an accurate dashboard if the transaction state switches between inblock <-> retracted multiple times

lexnv added 3 commits April 28, 2025 13:16
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@michalkucharczyk
Copy link
Contributor

I think that what we need is time to event (since submission) for every event type.
It may also make sense to deduplicated inblock / retracted so only first matters.

lexnv and others added 6 commits April 28, 2025 15:54
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/14712873864
Failed job name: build-rustdoc

lexnv added 2 commits April 28, 2025 17:15
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
lexnv added 2 commits April 29, 2025 09:52
control

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Copy link
Contributor

@iulianbarbu iulianbarbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


impl Metrics {
/// Creates a new [`Metrics`] instance.
pub fn new(registry: &Registry) -> Result<Self, PrometheusError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would help a tiny bit mentioning the time unit in the metrics descriptions, while reading the code.

Comment on lines +521 to +525
let rpc_v2_metrics = if let Some(registry) = config.prometheus_registry() {
Some(sc_rpc_spec_v2::transaction::TransactionMetrics::new(registry)?)
} else {
None
};
Copy link
Contributor

@iulianbarbu iulianbarbu May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
let rpc_v2_metrics = if let Some(registry) = config.prometheus_registry() {
Some(sc_rpc_spec_v2::transaction::TransactionMetrics::new(registry)?)
} else {
None
};
let rpc_v2_metrics = config.prometheus_registry().map(|registry| sc_rpc_spec_v2::transaction::TransactionMetrics::new(registry)).transpose()?;

@@ -529,6 +537,7 @@ where
config.blocks_pruning,
backend.clone(),
&*rpc_builder,
rpc_v2_metrics.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the clone?

Comment on lines +132 to +136
let event = handle_event(event);

if let Some(ref event) = event {
metrics.register_event(event);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inspect maybe could be used?

@michalkucharczyk
Copy link
Contributor

@olliecorbisiero mentioned you would like that some adjustment of buckets maybe required. We can wait with merge to confirm that buckets are fine.

"rpc_transaction_in_block_time",
"RPC Transaction in block time",
)
.buckets(linear_buckets(0.0, 3.0, 20).expect("Valid buckets; qed")),
Copy link
Contributor

@michalkucharczyk michalkucharczyk May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olliecorbisiero here are new buckets for in-block.

Buckets for internal pool events (they you are using now in dashboard) were exactly the same:

in_block: register(
Histogram::with_opts(histogram_opts!(
"substrate_sub_txpool_timing_event_in_block",
"Histogram of timings for reporting InBlock event",
linear_buckets(0.0, 3.0, 20).unwrap()
))?,
registry,

Let us know what would work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants