Resolve the bottleneck created by the chain runtime in hermes #1489

ancazamfir · 2021-10-25T13:56:24Z

Crate

relayer

Summary

I was debugging hermes recently on a testnet on a channel where lots of packets have been relayed. I noticed:

when starting hermes the client worker does a lot of RPCs to retrieve headers (from src node and from dst Tx) and delays significantly the packet workers
- partly addressed by Allow more granular configuration of Hermes modes of operation #1518
after each relay operation, the client worker retrieves the header from the Tx on dest chain and then the header from the source chain, again delaying the packet worker thread
- partly addressed by Misbehavior check triggered by UpdateClient event may run for a long time #1417
in general all workers are affected by the runtime bottleneck, as all have to send their queries via the runtime channel and wait in line, possible after a series of heavy query requests.
- this is a long-term architectural change, will probably be batched with other changes we do to the relayer architecture

Problem Definition

Proposal

In the long run we should allow queries to be done directly from the worker. There is no need to use the chain runtime for this. It should be only required for Tx.

In the short term segregate the client worker from the others, via configuration. This will allow a hermes instance to run in the background just to monitor for misbehaviour and keep clients fresh. Another instance could run for packet relaying and the slowdown caused by the client workers should be significantly less.

Fixing #1417 would also help.

Acceptance Criteria

For Admin Use

Not duplicate issue
Appropriate labels applied
Appropriate milestone (priority) applied
Appropriate contributors tagged
Contributor assigned/self-assigned

ancazamfir changed the title ~~Handle hermes chain runtime thread bottleneck~~ Resolve the bottleneck created by the chain runtime in hermes Oct 25, 2021

adizere assigned romac Oct 26, 2021

adizere added this to the 11.2021 milestone Oct 26, 2021

adizere added A: bug Admin: something isn't working O: new-feature Objective: cause to add a new feature or support O: performance Objective: cause to improve performance labels Oct 26, 2021

romac removed the A: bug Admin: something isn't working label Oct 26, 2021

ancazamfir mentioned this issue Oct 28, 2021

Allow more granular configuration of Hermes modes of operation #1518

Closed

5 tasks

romac added the P-medium label Nov 2, 2021

adizere modified the milestones: v0.8.1, Backlog Nov 3, 2021

adizere added A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews and removed P-medium labels Nov 3, 2021

adizere modified the milestones: Backlog, v2 May 9, 2022

adizere modified the milestones: v2, v1.1 Jun 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve the bottleneck created by the chain runtime in hermes #1489

Resolve the bottleneck created by the chain runtime in hermes #1489

ancazamfir commented Oct 25, 2021 •

edited by adizere

Loading

Resolve the bottleneck created by the chain runtime in hermes #1489

Resolve the bottleneck created by the chain runtime in hermes #1489

Comments

ancazamfir commented Oct 25, 2021 • edited by adizere Loading

Crate

Summary

Problem Definition

Proposal

Acceptance Criteria

For Admin Use

ancazamfir commented Oct 25, 2021 •

edited by adizere

Loading