Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new metrics requested by operators #2182

Merged
merged 31 commits into from
May 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b2bf490
Add metric for number of WS reconnections
romac May 3, 2022
bbd4197
Reformatting
romac May 3, 2022
fe54a90
Add metric for number of IBC events received over WebSocket
romac May 3, 2022
5f35129
Add missing doc comments
romac May 3, 2022
08fe032
Add metric for number of messages submitted to a chain
romac May 3, 2022
c885d1e
Add facility for querying an account's balance via the bank module
romac May 4, 2022
a4db8e5
Code reorganization
romac May 4, 2022
0d15e95
Move `query_balance` to `ChainEndpoint`
romac May 4, 2022
2afafd4
Add `wallet_balance` metric
romac May 4, 2022
71471a4
Add `query_balance` to the chain handle
romac May 4, 2022
6892327
Update parameter name
romac May 5, 2022
f516c9d
Add wallet worker to monitor wallet balance and populate correspondin…
romac May 10, 2022
71f5420
Update bank proto
romac May 11, 2022
76adc3a
Implement `tx_latency` metric
romac May 11, 2022
263eb5d
Use `moka::sync::Cache` to capture in-flight txs in the telemetry sta…
romac May 12, 2022
670a187
Use short UUID
romac May 12, 2022
a6eb59f
Rename `tx_latency` to `tx_latency_confirmed` and add `tx_latency_sub…
romac May 12, 2022
787431e
Remove useless clone
romac May 12, 2022
7be0dc1
Add more labels to `tx_latency_*` metrics
romac May 12, 2022
e525576
Fix for rebase on master
romac May 12, 2022
b51bb46
Update guide with new metrics
romac May 13, 2022
60933be
Add changelog entry
romac May 13, 2022
66ad0c2
Do not add tracking id as label to tx_latency metrics to avoid blowin…
romac May 13, 2022
800f73b
Use custom aggregator to properly report wallet balance metric
romac May 16, 2022
c2b52ea
Update guide metrics
romac May 16, 2022
907d41a
Merge branch 'master' into romac/2112-telemetry
romac May 16, 2022
a07d836
Document `TrackingId`
romac May 17, 2022
a35a79a
Rename constructors of `TrackingId`
romac May 17, 2022
2d3a34d
Turn comments into doc comments
romac May 17, 2022
ef9db39
Merge branch 'master' into romac/2112-telemetry
romac May 17, 2022
684d1c3
Rename `chain::tx` module to `chain::tracking`
romac May 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- Add six new metrics: `wallet_balance`, `ws_events`, `ws_reconnect`,
`tx_latency_submitted`, `tx_latency_confirmed`, `msg_num`
([#2112](https://github.com/informalsystems/ibc-rs/issues/2112))
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Generated by Cargo
# will have compiled files and executables
/target/
target/

# These are backup files generated by rustfmt
**/*.rs.bk
Expand Down
16 changes: 14 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

136 changes: 122 additions & 14 deletions guide/src/telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ The following table describes the metrics currently tracked by the telemetry ser
| `ibc_receive_packets` | Number of receive packets relayed per channel | `u64` Counter |
| `ibc_acknowledgment_packets` | Number of acknowledgment packets relayed per channel | `u64` Counter |
| `ibc_timeout_packets` | Number of timeout packets relayed per channel | `u64` Counter |
| `wallet_balance` | How much balance (coins) there is left in each wallet key that Hermes is using. | `u64` ValueRecorder |
| `ws_events` | How many IBC events did Hermes receive via the websocket subscription, in total since starting up, per chain. | Counter |
| `ws_reconnect` | Number of times Hermes had to reconnect to the WebSocket endpoint | Counter |
| `tx_latency_submitted` | Latency for all transactions submitted to a chain (i.e., difference between the moment when Hermes received an event until the corresponding transaction(s) were submitted). | `u64` ValueRecorder |
| `tx_latency_confirmed` | Latency for all transactions confirmed by a chain (i.e., difference between the moment when Hermes received an event until the corresponding transaction(s) were confirmed). Requires `tx_confirmation = true`. | `u64` ValueRecorder |
| `msg_num` | How many messages Hermes submitted to a specific chain. | `u64` Counter |

## Integration with Prometheus

Expand All @@ -48,35 +54,137 @@ After starting Hermes with `hermes start`, and letting it run for a while to rel
open [`http://localhost:3001/metrics`](http://localhost:3001/metrics) in a browser, you should
see Prometheus-encoded metrics.

For example, with 3 channels and after transferring some tokens between the chains:
For example, with two channels and after transferring some tokens between the chains:

```text
# HELP cache_hits Number of cache hits for queries emitted by the relayer, per chain and query type
# TYPE cache_hits counter
cache_hits{chain="ibc-0",query_type="query_channel"} 276
cache_hits{chain="ibc-0",query_type="query_client_state"} 177
cache_hits{chain="ibc-0",query_type="query_connection"} 160
cache_hits{chain="ibc-1",query_type="query_channel"} 240
cache_hits{chain="ibc-1",query_type="query_client_state"} 173
cache_hits{chain="ibc-1",query_type="query_connection"} 160
# HELP ibc_acknowledgment_packets Number of acknowledgment packets relayed per channel
# TYPE ibc_acknowledgment_packets counter
ibc_acknowledgment_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 300
ibc_acknowledgment_packets{src_chain="ibc-0",src_channel="channel-1",src_port="transfer"} 100
ibc_acknowledgment_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 48
ibc_acknowledgment_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 0
ibc_acknowledgment_packets{src_chain="ibc-0",src_channel="channel-1",src_port="transfer"} 42
ibc_acknowledgment_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 110
ibc_acknowledgment_packets{src_chain="ibc-1",src_channel="channel-1",src_port="transfer"} 0
# HELP ibc_receive_packets Number of receive packets relayed per channel
# TYPE ibc_receive_packets counter
ibc_receive_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 48
ibc_receive_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 110
ibc_receive_packets{src_chain="ibc-0",src_channel="channel-1",src_port="transfer"} 0
ibc_receive_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 300
ibc_receive_packets{src_chain="ibc-1",src_channel="channel-1",src_port="transfer"} 100
ibc_receive_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 0
ibc_receive_packets{src_chain="ibc-1",src_channel="channel-1",src_port="transfer"} 42
# HELP ibc_timeout_packets Number of timeout packets relayed per channel
# TYPE ibc_timeout_packets counter
ibc_timeout_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 1
ibc_timeout_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 0
ibc_timeout_packets{src_chain="ibc-0",src_channel="channel-1",src_port="transfer"} 0
ibc_timeout_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 0
ibc_timeout_packets{src_chain="ibc-1",src_channel="channel-1",src_port="transfer"} 0
# HELP msg_num How many messages Hermes submitted to the chain, per chain
# TYPE msg_num counter
msg_num{chain="ibc-0"} 168
msg_num{chain="ibc-1"} 156
# HELP queries Number of queries emitted by the relayer, per chain and query type
# TYPE queries counter
queries{chain="ibc-0",query_type="query_application_status"} 23
queries{chain="ibc-0",query_type="query_channel"} 88
queries{chain="ibc-0",query_type="query_client_connections"} 2
queries{chain="ibc-0",query_type="query_client_state"} 383
queries{chain="ibc-0",query_type="query_clients"} 1
queries{chain="ibc-0",query_type="query_connection"} 2
queries{chain="ibc-0",query_type="query_connection_channels"} 2
queries{chain="ibc-0",query_type="query_consensus_state"} 392
queries{chain="ibc-0",query_type="query_consensus_states"} 2
queries{chain="ibc-0",query_type="query_latest_height"} 1
queries{chain="ibc-0",query_type="query_packet_acknowledgements"} 5
queries{chain="ibc-0",query_type="query_packet_commitments"} 10
queries{chain="ibc-0",query_type="query_staking_params"} 2
queries{chain="ibc-0",query_type="query_txs"} 76
queries{chain="ibc-0",query_type="query_unreceived_acknowledgements"} 241
queries{chain="ibc-0",query_type="query_unreceived_packets"} 127
queries{chain="ibc-1",query_type="query_application_status"} 20
queries{chain="ibc-1",query_type="query_channel"} 224
queries{chain="ibc-1",query_type="query_client_connections"} 2
queries{chain="ibc-1",query_type="query_client_state"} 387
queries{chain="ibc-1",query_type="query_clients"} 1
queries{chain="ibc-1",query_type="query_connection"} 2
queries{chain="ibc-1",query_type="query_connection_channels"} 2
queries{chain="ibc-1",query_type="query_consensus_state"} 394
queries{chain="ibc-1",query_type="query_consensus_states"} 3
queries{chain="ibc-1",query_type="query_latest_height"} 1
queries{chain="ibc-1",query_type="query_packet_acknowledgements"} 5
queries{chain="ibc-1",query_type="query_packet_commitments"} 10
queries{chain="ibc-1",query_type="query_staking_params"} 2
queries{chain="ibc-1",query_type="query_txs"} 56
queries{chain="ibc-1",query_type="query_unreceived_acknowledgements"} 127
queries{chain="ibc-1",query_type="query_unreceived_packets"} 292
# HELP tx_latency_confirmed The latency for all transactions submitted to a specific chain, i.e. the difference between the moment when Hermes received a batch of events until the corresponding transaction(s) were confirmed. Milliseconds.
# TYPE tx_latency_confirmed histogram
tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="0.5"} 0
tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="0.9"} 0
tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="0.99"} 0
tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="+Inf"} 4
tx_latency_confirmed_sum{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 22466
tx_latency_confirmed_count{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 4
tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="0.5"} 0
tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="0.9"} 0
tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="0.99"} 0
tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="+Inf"} 1
tx_latency_confirmed_sum{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer"} 4256
tx_latency_confirmed_count{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer"} 1
tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="0.5"} 0
tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="0.9"} 0
tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="0.99"} 0
tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="+Inf"} 2
tx_latency_confirmed_sum{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 9408
tx_latency_confirmed_count{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 2
tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer",le="0.5"} 0
tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer",le="0.9"} 0
tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer",le="0.99"} 0
tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer",le="+Inf"} 1
tx_latency_confirmed_sum{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer"} 3173
tx_latency_confirmed_count{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer"} 1
# HELP tx_latency_submitted The latency for all transactions submitted to a specific chain, i.e. the difference between the moment when Hermes received a batch of events and when it submitted the corresponding transaction(s). Milliseconds.
# TYPE tx_latency_submitted histogram
tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="0.5"} 0
tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="0.9"} 0
tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="0.99"} 0
tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="+Inf"} 5
tx_latency_submitted_sum{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 14428
tx_latency_submitted_count{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 5
tx_latency_submitted_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="0.5"} 0
tx_latency_submitted_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="0.9"} 0
tx_latency_submitted_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="0.99"} 0
tx_latency_submitted_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="+Inf"} 1
tx_latency_submitted_sum{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer"} 729
tx_latency_submitted_count{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer"} 1
tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="0.5"} 0
tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="0.9"} 0
tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="0.99"} 0
tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="+Inf"} 2
tx_latency_submitted_sum{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 1706
tx_latency_submitted_count{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 2
tx_latency_submitted_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer",le="0.5"} 0
tx_latency_submitted_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer",le="0.9"} 0
tx_latency_submitted_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer",le="0.99"} 0
tx_latency_submitted_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer",le="+Inf"} 1
tx_latency_submitted_sum{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer"} 791
tx_latency_submitted_count{chain="ibc-1",channel="channel-1",counterparty="ibc-0",port="transfer"} 1
# HELP wallet_balance The balance in each wallet that Hermes is using, per wallet, denom and chain
# TYPE wallet_balance gauge
wallet_balance{account="cosmos1934akx97773lsjjs9x74dr03uuam29hcc9grp3",chain="ibc-0",denom="stake"} 99999970473
wallet_balance{account="cosmos1hngzqscyg476nd68qggxps8r2aq56lne45ps8n",chain="ibc-1",denom="stake"} 99999978431
# HELP workers Number of workers per object
# TYPE workers gauge
workers{type="client"} 6
workers{type="client"} 4
workers{type="packet"} 4
workers{type="wallet"} 2
# HELP ws_events How many IBC events did Hermes receive via the WebSocket subscription, per chain
# TYPE ws_events counter
ws_events{chain="ibc-0"} 443
ws_events{chain="ibc-1"} 370
```

### Visualization with Grafana

Here's how these metrics look like in [Grafana](https://prometheus.io/docs/visualization/grafana/) with a Prometheus data source:

![Hermes metrics in Grafana](./images/grafana.png)
2 changes: 2 additions & 0 deletions proto-compiler/src/cmd/compile.rs
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ impl CompileCmd {
format!("{}/proto/cosmos/gov", sdk_dir.display()),
format!("{}/proto/cosmos/tx", sdk_dir.display()),
format!("{}/proto/cosmos/base", sdk_dir.display()),
format!("{}/proto/cosmos/bank", sdk_dir.display()),
format!("{}/proto/cosmos/staking", sdk_dir.display()),
format!("{}/proto/cosmos/upgrade", sdk_dir.display()),
];
Expand Down Expand Up @@ -247,6 +248,7 @@ impl CompileCmd {
.type_attribute(".cosmos.upgrade.v1beta1", attrs_serde)
.type_attribute(".cosmos.base.v1beta1", attrs_serde)
.type_attribute(".cosmos.base.query.v1beta1", attrs_serde)
.type_attribute(".cosmos.bank.v1beta1", attrs_serde)
.compile(&protos, &includes);

match compilation {
Expand Down
5 changes: 5 additions & 0 deletions proto/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ pub mod cosmos {
include_proto!("cosmos.staking.v1beta1.rs");
}
}
pub mod bank {
pub mod v1beta1 {
include_proto!("cosmos.bank.v1beta1.rs");
}
}
pub mod base {
pub mod abci {
pub mod v1beta1 {
Expand Down
Loading