feat: NTX Builder Account Actors#1219
Conversation
|
@Mirko-von-Leipzig this is a simple beginning. Thought would be good to get early feedback. Only has bare bones functionality / communication between "coordinator" and actors right now. E.G. there is messaging from actor -> coordinator, just the other way round. |
a805cbd to
6ccba13
Compare
Mirko-von-Leipzig
left a comment
There was a problem hiding this comment.
I think the largest change I'd like to see is the move away from a custom handle, and towards a join set -- I think that will clean up a few things
bbe03a5 to
690b475
Compare
56b60ce to
f6afbdf
Compare
cc94175 to
a15ff20
Compare
|
|
||
| /// The mode of operation that the account actor is currently performing. | ||
| #[derive(Default, Debug)] | ||
| enum ActorMode { |
There was a problem hiding this comment.
Ideally, the states describe the current action as opposed to a property.
I'd propose Idle, ProcessingNote and AwaitingTx.
There was a problem hiding this comment.
I think that is more or less what I had it like before some feedback in this PR. I do still prefer that way. If @Mirko-von-Leipzig has a chance to weigh in again I'll let him. Sorry if my memory has not served me well on this topic!
| /// block producer. | ||
| /// 4. **State Updates**: Updates internal state based on mempool events and execution results. | ||
| /// 5. **Shutdown**: Terminates gracefully when cancelled or encounters unrecoverable errors. | ||
| /// |
There was a problem hiding this comment.
Could you document the possible state transitions?
|
|
||
| /// Runs the account actor, processing events and managing state until a reason to shutdown is | ||
| /// encountered. | ||
| pub async fn run(mut self, semaphore: Arc<Semaphore>) -> ActorShutdownReason { |
There was a problem hiding this comment.
Rather than spawning a Tokio task for each actor, the overhead of doing so can be avoided by changing this into an on_message() function. Rather than ActorShutdownReason, the function would return Option<ActorShutdownReason>.
In AccountActor, event_rx: mpsc::Receiver<MempoolEvent>, could be replaced by a pending: VecDeque<MempoolEvent>. If the MempoolEvent passed to on_message() cannot be processed in the current ActorMode, on_message() would add it to pending.
There was a problem hiding this comment.
Rather than spawning a Tokio task for each actor, the overhead of doing so can be avoided
What is the overhead that we would be worried about in this stack (or wider program)?
changing this into an on_message() function
Do you mean that the coordinator would spawn a task for each mempool (for each network account) event rather than having a single task for each network account?
There was a problem hiding this comment.
What is the overhead that we would be worried about in this stack (or wider program)?
The memory overhead of spawning 1 million Tokio tasks is around 450 MB (source), although it will be higher due to channels and locks needed for coordination.
Do you mean that the coordinator would spawn a task for each mempool (for each network account) event rather than having a single task for each network account?
Yes.
There was a problem hiding this comment.
If I understand correctly the main idea is to move the rate limiting into the coordinator so that we limit the number of tasks spawned, rather than limiting the number of concurrently unblocked tasks.
Sounds like its worth exploring.
|
|
||
| // Run the actor. | ||
| let semaphore = self.semaphore.clone(); | ||
| self.actor_join_set.spawn(Box::pin(actor.run(semaphore))); |
There was a problem hiding this comment.
With the change proposed above, this task does not need to be spawned.
In broadcast_event(), on_message(...).await could be called for N actors in parallel, which could be tuned based on benchmarks.
After each on_message().await call, we would mark the actor as available.
A background task gets notified when an actor becomes available and redelivers any pending messages to it.
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Thank you. I left some more comments inline. The main one for now is about how we manage the actor "mode".
I also think this structure is probably half-way to where we'd need to get to eventually. The main things to address in follow-ups:
- How we segment the actor state - the current structure does not seem clean to me - but changing it now will probably result in quite a bit of refactoring.
- How we handle event routing in the coordinator. I think this may have significant performance implications - but also, we should confirm if that's the case first.
- How to handle actor failures - e.g., under what circumstances we may want to respawn failed actors.
The first point is probably also good to address in combination with introducing lazy account loading as both of these will have impact on how the actor state is structured.
| /// | ||
| /// ## Event Broadcasting | ||
| /// - Distributes mempool events to all relevant account actors. | ||
| /// - Handles communication failures by canceling disconnected actors. |
There was a problem hiding this comment.
IIUC, when an actor fails for w/e reason, we don't re-spawn it, correct? If so, it may be good to potentially get more granular in the future as actors could fail for a variety of reasons, some of which could be transient.
There was a problem hiding this comment.
Yes that is correct. Yes this is something to look at next - as per your comment above for follow up work (point 3).
023ed78 to
2fa0988
Compare
Context
Closes #1192.
We are moving the NTX Builder from its current implementation, which manages all network accounts through a single map/index, to a design which maintains dedicated tasks for each network account.
Changes
rpc GetNetworkAccounts(google.protobuf.Empty)endpoint tontx_builder.proto.AccountActortype which receives mempool events from the builder and executes transactions against the account state.Statetype to be concerned with only a single account.Coordinatortype to manageAccountActors and send mempool events from the builder.GetUnconsumedNetworkNotesendpoint logic withGetUnconsumedNetworkNotesForAccount.