-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPC Collator: Introduce Useful Metrics #5409
Comments
github-merge-queue bot
pushed a commit
that referenced
this issue
Sep 19, 2024
) # Description When we start a node with connections to external RPC servers (as a minimal node), we lack metrics around how many individual calls we're doing to the remote RPC servers and their duration. This PR adds metrics that measure durations of each RPC call made by the minimal nodes, and implicitly how many calls there are. Closes #5409 Closes #5689 ## Integration Node operators should be able to track minimal node metrics and decide appropriate actions according to how the metrics are interpreted/felt. The added metrics can be observed by curl'ing the prometheus metrics endpoint for the ~relaychain~ parachain (it was changed based on the review). The metrics are represented by ~`polkadot_parachain_relay_chain_rpc_interface`~ `relay_chain_rpc_interface` namespace (I realized lining up `parachain_relay_chain` in the same metric might be confusing :). Excerpt from the curl: ``` relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.001"} 15 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.004"} 23 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.016"} 23 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.064"} 23 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.256"} 24 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="1.024"} 24 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="4.096"} 24 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="16.384"} 24 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="65.536"} 24 relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="+Inf"} 24 relay_chain_rpc_interface_sum{method="chain_getBlockHash",chain="rococo_local_testnet"} 0.11719075 relay_chain_rpc_interface_count{method="chain_getBlockHash",chain="rococo_local_testnet"} 24 ``` ## Review Notes The way we measure durations/hits is based on `HistogramVec` struct which allows us to collect timings for each RPC client method called from the minimal node., It can be extended to measure the RPCs against other dimensions too (status codes, response sizes, etc). The timing measuring is done at the level of the `relay-chain-rpc-interface`, in the `RelayChainRpcClient` struct's method 'request_tracing'. A single entry point for all RPC requests done through the relay-chain-rpc-interface. The requests durations will fall under exponential buckets described by start `0.001`, factor `4` and count `9`. --------- Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When starting a collator node there is an option to connect to external relay chain nodes via
--relay-chain-rpc-urls
.The spawned
RelayChainRpcInterface
performs the RPC calls to the relay chain throughRelayChainRpcClient
. It is used internally by the collator as well as in the minimal-nodes subsystems.In the past we had instances where a huge amount of requests in combination with longer answer times led to complications in the subsystems. To allow better monitoring, we need some metrics on the
RelayChainRpcClient
.We should expose metrics for the average timing information as well as a aggregates of the counts.
A nice starting point is to spin up a network using this zombienet test: https://github.com/paritytech/polkadot-sdk/blob/master/cumulus/zombienet/tests/0006-rpc_collator_builds_blocks.toml
From there you can discover the prometheus metrics that are exposed.
The text was updated successfully, but these errors were encountered: