Add pagination support for clear packets
#2077
Labels
A: breaking
Admin: breaking change that may impact operators
O: performance
Objective: cause to improve performance
O: usability
Objective: cause to improve the user experience (UX) and ease using the product
Milestone
Problem
In production, Hermes often encounters the problem of needing to clear many (hundreds) of packets on a given path. This can take a very long time, a problem compounded due to full nodes being slow at times. If a node is slow and Hermes takes a long time (on the order of minutes) to build a transactions for relaying some packets, that transaction might prove to be invalid or outdated by the Hermes finally submits it, making the whole process very painstaking.
The problem is that Hermes does bulk actions while trying to clear the packets on a path. This can be seen in the following spots that are on the critical path of
clear packets
CLI, for example:Acceptance criteria
Discussion
Hermes CLI for packet clearing should provide support for splitting the workload in multiple, smaller batches, to make this command more tractable. This can be done, for instance, with any of the following suggestions:
A. By adding a flag
clear packets ... --limit X
to specify that Hermes should limit itself to fetching only X amount of packets/acks (instead of all pages). Thelimit
option should probably be mandatory.B. Alternatively, we could handle splitting into smaller batches of X messages automatically within the implementation of
clear packets
and have a simpler interface.The advantage of A. is that the command is a lot more interactive and will finish faster. The problem is that the operators will not know if there will still be packets left to relay, so they will need to repeatedly invoke
clear packets --limit X
until this method returns an empty result (signalling there was nothing to clear).The advantage of B. is that this command is smarter and simpler. The disadvantage is that will take longer to finish. On a high-activity path with constant activity, depending on the implementation and how we split-up the batches, I can even imagine that this method will constantly run for as long as there are packets being created, which is a problem (the CLI should return, instead of grabbing & relaying every new packet).
Other solutions might exist. We should evaluate the pros and cons through the lenses of operator user experience and simplicity.
Tasks
The text was updated successfully, but these errors were encountered: