-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: Resend transactions using same mechanism as initial send #915
Conversation
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #915 +/- ##
=========================================
- Coverage 81.9% 81.8% -0.1%
=========================================
Files 853 853
Lines 231847 231811 -36
=========================================
- Hits 189909 189837 -72
- Misses 41938 41974 +36 |
i think the staggering thing is bandaging over TPU QUIC backoff being broken both server and client side. server side improvments should be shipping soon. not sure if there's been actual progress on client side? |
That's all correct. I haven't had time yet, but I can put some time into client-side backoff |
* client: Timeout tpu sends * Resend transactions one at a time, same as the first send * Simplify resends to happen right after initial sends * Skip already processed transaction errors (cherry picked from commit ed11b72)
* client: Timeout tpu sends * Resend transactions one at a time, same as the first send * Simplify resends to happen right after initial sends * Skip already processed transaction errors (cherry picked from commit ed11b72)
…nd (backport of #915) (#1072) client: Resend transactions using same mechanism as initial send (#915) * client: Timeout tpu sends * Resend transactions one at a time, same as the first send * Simplify resends to happen right after initial sends * Skip already processed transaction errors (cherry picked from commit ed11b72) Co-authored-by: Jon C <me@jonc.dev>
…nd (backport of anza-xyz#915) (anza-xyz#1072) client: Resend transactions using same mechanism as initial send (anza-xyz#915) * client: Timeout tpu sends * Resend transactions one at a time, same as the first send * Simplify resends to happen right after initial sends * Skip already processed transaction errors (cherry picked from commit ed11b72) Co-authored-by: Jon C <me@jonc.dev>
Problem
There are still issues with deploying programs. Even against a local validator, it takes multiple resends to get a program deployment through. For large amounts of transactions, using
send_wire_transaction_batch
seems to be flaky, especially against public networks.Summary of Changes
This contains a couple of changes, and can be viewed commit by commit. The concept is to have the TPU client resend transactions at a pace of 100 transactions per second, same as the initial send, and to avoid using
send_wire_transaction_batch
.And to make sends a bit smoother and retry sooner, all sends now timeout after 5 seconds, even the initial send. This allows the resends to start immediately after the initial send, so we can simplify the code and avoid concurrently sending and resending the same transactions. This doesn't totally resolve issues with deploys mainnet unfortunately, since just getting the transaction to the leader over TPU is nearly impossible, but it makes sends much more consistent against testnet.
I tested by alternating the old and new code to send 5,000 transactions on testnet. The transactions were unique self-transfers with 1 million micro-lamports per CU. Runs were cancelled after 5 minutes. Results:
Fixes #