Skip to content

feat: NTX Builder Account Actors#1219

Merged
sergerad merged 127 commits intontx-refactorfrom
sergerad-ntx-actor
Nov 3, 2025
Merged

feat: NTX Builder Account Actors#1219
sergerad merged 127 commits intontx-refactorfrom
sergerad-ntx-actor

Conversation

@sergerad
Copy link
Collaborator

@sergerad sergerad commented Sep 10, 2025

Context

Closes #1192.

We are moving the NTX Builder from its current implementation, which manages all network accounts through a single map/index, to a design which maintains dedicated tasks for each network account.

Changes

  • Add rpc GetNetworkAccounts(google.protobuf.Empty) endpoint to ntx_builder.proto.
  • Add AccountActor type which receives mempool events from the builder and executes transactions against the account state.
  • Update State type to be concerned with only a single account.
  • Add Coordinator type to manage AccountActors and send mempool events from the builder.
  • Replace the GetUnconsumedNetworkNotes endpoint logic with GetUnconsumedNetworkNotesForAccount.

@sergerad
Copy link
Collaborator Author

@Mirko-von-Leipzig this is a simple beginning. Thought would be good to get early feedback. Only has bare bones functionality / communication between "coordinator" and actors right now. E.G. there is messaging from actor -> coordinator, just the other way round.

Copy link
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the largest change I'd like to see is the move away from a custom handle, and towards a join set -- I think that will clean up a few things

@sergerad sergerad force-pushed the sergerad-ntx-actor branch 3 times, most recently from bbe03a5 to 690b475 Compare September 18, 2025 03:47
@sergerad sergerad marked this pull request as ready for review September 20, 2025 09:34
@sergerad sergerad requested a review from bobbinth September 20, 2025 20:01
@bobbinth bobbinth requested a review from igamigo September 22, 2025 17:45
@sergerad sergerad requested a review from igamigo September 24, 2025 02:31
@sergerad sergerad requested a review from bobbinth October 23, 2025 00:48
@sergerad sergerad requested a review from igamigo October 23, 2025 02:00

/// The mode of operation that the account actor is currently performing.
#[derive(Default, Debug)]
enum ActorMode {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the states describe the current action as opposed to a property.

I'd propose Idle, ProcessingNote and AwaitingTx.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is more or less what I had it like before some feedback in this PR. I do still prefer that way. If @Mirko-von-Leipzig has a chance to weigh in again I'll let him. Sorry if my memory has not served me well on this topic!

/// block producer.
/// 4. **State Updates**: Updates internal state based on mempool events and execution results.
/// 5. **Shutdown**: Terminates gracefully when cancelled or encounters unrecoverable errors.
///
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document the possible state transitions?


/// Runs the account actor, processing events and managing state until a reason to shutdown is
/// encountered.
pub async fn run(mut self, semaphore: Arc<Semaphore>) -> ActorShutdownReason {
Copy link

@tindzk tindzk Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than spawning a Tokio task for each actor, the overhead of doing so can be avoided by changing this into an on_message() function. Rather than ActorShutdownReason, the function would return Option<ActorShutdownReason>.

In AccountActor, event_rx: mpsc::Receiver<MempoolEvent>, could be replaced by a pending: VecDeque<MempoolEvent>. If the MempoolEvent passed to on_message() cannot be processed in the current ActorMode, on_message() would add it to pending.

Copy link
Collaborator Author

@sergerad sergerad Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than spawning a Tokio task for each actor, the overhead of doing so can be avoided

What is the overhead that we would be worried about in this stack (or wider program)?

changing this into an on_message() function

Do you mean that the coordinator would spawn a task for each mempool (for each network account) event rather than having a single task for each network account?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the overhead that we would be worried about in this stack (or wider program)?

The memory overhead of spawning 1 million Tokio tasks is around 450 MB (source), although it will be higher due to channels and locks needed for coordination.

Do you mean that the coordinator would spawn a task for each mempool (for each network account) event rather than having a single task for each network account?

Yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the main idea is to move the rate limiting into the coordinator so that we limit the number of tasks spawned, rather than limiting the number of concurrently unblocked tasks.

Sounds like its worth exploring.


// Run the actor.
let semaphore = self.semaphore.clone();
self.actor_join_set.spawn(Box::pin(actor.run(semaphore)));
Copy link

@tindzk tindzk Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change proposed above, this task does not need to be spawned.

In broadcast_event(), on_message(...).await could be called for N actors in parallel, which could be tuned based on benchmarks.

After each on_message().await call, we would mark the actor as available.

A background task gets notified when an actor becomes available and redelivers any pending messages to it.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you. I left some more comments inline. The main one for now is about how we manage the actor "mode".

I also think this structure is probably half-way to where we'd need to get to eventually. The main things to address in follow-ups:

  1. How we segment the actor state - the current structure does not seem clean to me - but changing it now will probably result in quite a bit of refactoring.
  2. How we handle event routing in the coordinator. I think this may have significant performance implications - but also, we should confirm if that's the case first.
  3. How to handle actor failures - e.g., under what circumstances we may want to respawn failed actors.

The first point is probably also good to address in combination with introducing lazy account loading as both of these will have impact on how the actor state is structured.

///
/// ## Event Broadcasting
/// - Distributes mempool events to all relevant account actors.
/// - Handles communication failures by canceling disconnected actors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, when an actor fails for w/e reason, we don't re-spawn it, correct? If so, it may be good to potentially get more granular in the future as actors could fail for a variety of reasons, some of which could be transient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct. Yes this is something to look at next - as per your comment above for follow up work (point 3).

@tindzk tindzk mentioned this pull request Oct 26, 2025
@sergerad sergerad requested a review from bobbinth October 27, 2025 20:45
@sergerad sergerad changed the base branch from next to ntx-refactor October 27, 2025 22:43
@sergerad sergerad merged commit b0937d3 into ntx-refactor Nov 3, 2025
6 checks passed
@sergerad sergerad deleted the sergerad-ntx-actor branch November 3, 2025 00:49
@sergerad sergerad mentioned this pull request Nov 3, 2025
@sergerad sergerad mentioned this pull request Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Network Transaction Builder: Account-level Parallelization

7 participants