scheduler forward packets #898

apfitzge · 2024-04-19T00:02:06Z

Problem

swqos was built with the assumption that transactions are forwarded by validator nodes
without swqos being used, forwarding was in process of being deprecated since behavior (without swqow stuff) was useless
major push in past weeks has led to operators adopting swqos forwarding side-deals
the new scheduler does not support forwarding for above reasons
the lack of support would make operators need to make a choice between swqos side deals and the new scheduler, which is less than ideal under the current circumstances

Summary of Changes

Add support for forwarding transactions in the new scheduler

Fixes #

apfitzge · 2024-04-19T14:34:00Z

core/src/banking_stage/forward_packet_batches_by_accounts.rs

@@ -130,6 +134,10 @@ impl ForwardPacketBatchesByAccounts {
    pub fn iter_batches(&self) -> impl Iterator<Item = &ForwardBatch> {
        self.forward_batches.iter()
    }
+
+    pub fn take_iter_batches(self) -> impl Iterator<Item = ForwardBatch> {


I send the packets from these ForwardBatch, created by scheduler, to workers. Instead of cloning the packets, just take ownership.

codecov-commenter · 2024-04-19T23:10:54Z

Codecov Report

Attention: Patch coverage is 63.63636% with 64 lines in your changes are missing coverage. Please review.

Project coverage is 81.8%. Comparing base (f121f73) to head (97a2645).
Report is 57 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master     #898     +/-   ##
=========================================
- Coverage    81.9%    81.8%   -0.1%     
=========================================
  Files         853      853             
  Lines      231779   231938    +159     
=========================================
+ Hits       189829   189914     +85     
- Misses      41950    42024     +74

apfitzge · 2024-04-22T17:45:05Z

Local cluster summary looks fine:

mean_tps: 17282
max_tps: 52559
mean_confirmation_ms: 2978
max_confirmation_ms: 13360
99th_percentile_confirmation_ms: 11896
max_tower_distance: 32
last_tower_distance: 31
slots_per_second: 2.705

tao-stones · 2024-04-22T15:54:15Z

core/src/banking_stage/transaction_scheduler/transaction_state.rs

+        should_forward: bool,
+    },
+    /// Only used during transition.
+    Transitioning,


a necessary hack for take, only wish it is entirely internal

core/src/banking_stage.rs

tao-stones · 2024-04-22T21:34:24Z

core/src/banking_stage/transaction_scheduler/scheduler_controller.rs

+        // If we hit the time limit. Drop everything that was not checked/processed.
+        // If we cannot run these simple checks in time, then we cannot run them during
+        // leader slot.
+        if max_time_reached {


how's that 100 ms determined?

Just did 1/4 of a block time; the checks done in this should be really quick, and if we cannot do these within 100ms we won't get to these packets in a scheduling loop which does more complex checks than these (currently).

The logic of "we will do up to 100ms forwarding work, if there are still packets in container after that, they will be dropped" sounds a bit arbitrary. When I benching it, I have to change MAX_FORWARDING_DURATION to MAX. What do you think to stop processing forwarding when CU limit is hit (it already does that somewhat, just not perfect).

In any case, for this PR we worry about sending too many to next leader more than not sending enough. I think it is OK for now. Maybe keep an eye on number of forwarded packets vs dropped packets.

core/src/banking_stage/transaction_scheduler/scheduler_controller.rs

tao-stones

lgtm - offline benchmarking on new forwarding function are inline with original scheduler's forwarder. It adds the needed functionality to new scheduler.

I made a note on small implementation detail, which is the part I am looking at to optimize, for both original scheduler and new one. We can merge this PR, then refactor Forwarder afterwards

tao-stones · 2024-04-26T16:33:24Z

core/src/banking_stage/transaction_scheduler/scheduler_controller.rs

+        // If we hit the time limit. Drop everything that was not checked/processed.
+        // If we cannot run these simple checks in time, then we cannot run them during
+        // leader slot.
+        if max_time_reached {


The logic of "we will do up to 100ms forwarding work, if there are still packets in container after that, they will be dropped" sounds a bit arbitrary. When I benching it, I have to change MAX_FORWARDING_DURATION to MAX. What do you think to stop processing forwarding when CU limit is hit (it already does that somewhat, just not perfect).

In any case, for this PR we worry about sending too many to next leader more than not sending enough. I think it is OK for now. Maybe keep an eye on number of forwarded packets vs dropped packets.

mergify · 2024-06-10T06:51:44Z

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

(cherry picked from commit fb35f19) # Conflicts: # core/src/banking_stage/transaction_scheduler/prio_graph_scheduler.rs # core/src/banking_stage/transaction_scheduler/scheduler_controller.rs # core/src/banking_stage/transaction_scheduler/scheduler_metrics.rs # core/src/banking_stage/transaction_scheduler/transaction_state.rs # core/src/banking_stage/transaction_scheduler/transaction_state_container.rs # core/src/validator.rs

apfitzge commented Apr 19, 2024

View reviewed changes

apfitzge force-pushed the scheduler-support-forwarding branch from cc74da9 to 512de3e Compare April 19, 2024 16:04

apfitzge added 11 commits April 19, 2024 14:44

forward_packets outline

8511f78

store the packet - kill me

cfc42bc

forwarding support

70a451a

always buffer

1a49cb3

make local cluster tests use central-scheduler

36ac0c2

hang on to channels during test so it doesn't break

b897230

do better cleaning during forwarding

ca89dcd

don't even attempt to forward packets which are not from staked conn

f338712

add continue, obviously

d04fa9f

break out of loop after 100ms

bd9fbaa

drop everything we cannot forward within 100ms

225da20

apfitzge force-pushed the scheduler-support-forwarding branch from 512de3e to 225da20 Compare April 19, 2024 19:44

fix bug with not actually forwarding...

65ad46b

basic metric counting packets selected for forwarding

70e8f6a

apfitzge marked this pull request as ready for review April 22, 2024 17:45

apfitzge requested a review from tao-stones April 22, 2024 17:45

tao-stones reviewed Apr 22, 2024

View reviewed changes

scheduler does forwarding

97a2645

tao-stones approved these changes Apr 26, 2024

View reviewed changes

apfitzge merged commit fb35f19 into anza-xyz:master Apr 26, 2024
38 checks passed

apfitzge deleted the scheduler-support-forwarding branch April 26, 2024 17:18

apfitzge added the v1.18 label Jun 10, 2024

mergify bot mentioned this pull request Jun 10, 2024

v1.18: scheduler forward packets (backport of #898) #1663

Closed

This was referenced Jun 17, 2024

Validator does not forward transactions when central scheduler is enabled #1643

Closed

scheduler opt-in forwarding #1801

Merged

mergify bot mentioned this pull request Jul 25, 2024

v2.0: scheduler opt-in forwarding (backport of #1801) #2285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler forward packets #898

scheduler forward packets #898

apfitzge commented Apr 19, 2024 •

edited

Loading

apfitzge Apr 19, 2024

codecov-commenter commented Apr 19, 2024 •

edited

Loading

apfitzge commented Apr 22, 2024

tao-stones Apr 22, 2024

tao-stones Apr 22, 2024

apfitzge Apr 23, 2024

tao-stones Apr 26, 2024

tao-stones left a comment

tao-stones Apr 26, 2024

mergify bot commented Jun 10, 2024

scheduler forward packets #898

scheduler forward packets #898

Conversation

apfitzge commented Apr 19, 2024 • edited Loading

Problem

Summary of Changes

apfitzge Apr 19, 2024

Choose a reason for hiding this comment

codecov-commenter commented Apr 19, 2024 • edited Loading

Codecov Report

apfitzge commented Apr 22, 2024

tao-stones Apr 22, 2024

Choose a reason for hiding this comment

tao-stones Apr 22, 2024

Choose a reason for hiding this comment

apfitzge Apr 23, 2024

Choose a reason for hiding this comment

tao-stones Apr 26, 2024

Choose a reason for hiding this comment

tao-stones left a comment

Choose a reason for hiding this comment

tao-stones Apr 26, 2024

Choose a reason for hiding this comment

mergify bot commented Jun 10, 2024

apfitzge commented Apr 19, 2024 •

edited

Loading

codecov-commenter commented Apr 19, 2024 •

edited

Loading