feat: NTX Builder Actor Refactor by sergerad · Pull Request #1435 · 0xMiden/node

sergerad · 2025-12-08T23:55:02Z

Context

The Network Transaction Builder is being refactored so that each network account's responsibilities are executed as dedicated tasks.

Relates to #1056.

Some of the additions are due to file location changes (E.G. account_state.rs)

Changes

Add Actor struct to NTX Builder stack for housing state and logic pertaining to individual network accounts.
Add Coordinator to NTX Builder for coordinating Actors and feeding through mempool events.
Address edge case to handle mempool events that predate network account creation.
Stopped using NTB acronym and instead NTX Builder to avoid confusion.

Mirko-von-Leipzig

I've left quite a few comments on what I think next steps can/should be. @sergerad if they make sense, could you collate them into the tracking issue? If you disagree them we can litigate in that issue or here.

In terms of this PR, I'm happy to merge as is. There are a few potential paper-cuts but they should be rare.

Something else that bothers me is that the control flow and responsibilities of each part aren't super clear to me. I don't have an exact complaint, just that it feels a bit fuzzy and that the interface between each isn't clear.

Mirko-von-Leipzig · 2025-12-11T10:45:52Z

crates/ntx-builder/src/coordinator.rs

+    /// Cache of events received from the mempool that predate corresponding network accounts.
+    /// Grouped by account prefix to allow targeted event delivery to actors upon creation.
+    predating_events: HashMap<NetworkAccountPrefix, IndexMap<TransactionId, Arc<MempoolEvent>>>,


This isn't strictly accurate. What these are meant to be are events that are still inflight from transactions that have not yet been committed (nor reverted).

These are replayed to an actor when it starts up so that it doesn't "miss" any events. To this end they need to be stored and replayed in chronological order, as if the actor had been present when they arrived.

Mirko-von-Leipzig · 2025-12-11T11:02:36Z

crates/ntx-builder/src/coordinator.rs

+#[derive(Clone)]
+struct ActorHandle {
+    event_tx: mpsc::Sender<Arc<MempoolEvent>>,
+    cancel_token: CancellationToken,


I think this is redundant. The actor should already cancel if the channel gets closed so that's essentially the same thing.

Mirko-von-Leipzig · 2025-12-11T11:06:54Z

crates/ntx-builder/src/builder.rs

+        let mut interval = tokio::time::interval(self.ticker_interval);
+        interval.set_missed_tick_behavior(time::MissedTickBehavior::Skip);


I think this is no longer needed?

Mirko-von-Leipzig · 2025-12-11T11:09:28Z

crates/ntx-builder/src/builder.rs

+    /// Address of the remote prover. If `None`, transactions will be proven locally, which is
+    /// undesirable due to the performance impact.
+    tx_prover_url: Option<Url>,


Unrelated to this PR, but I wonder if we shouldn't move this remote/local division by always taking a URL in the components themselves, and then having the node's CLI figure out whether to spawn a prover locally or not.

This would require quite a bit of work though because the prover worker cannot queue requests. So nevermind :)

Mirko-von-Leipzig · 2025-12-11T11:12:03Z

crates/ntx-builder/src/builder.rs

+                // Handle actor result.
+                result = self.coordinator.next() => {
+                    result?;
+                },


I think the co-ordinator should probably never return an error? It seems like its only on semaphor failure or channel failure but I don't think those are valid reasons to crash the ntx.

Mirko-von-Leipzig · 2025-12-11T11:23:32Z

crates/ntx-builder/src/builder.rs

+        let (chain_tip_header, chain_mmr) = store
+            .get_latest_blockchain_data_with_retry()
+            .await?
+            .expect("store should contain a latest block");
+        let mut mempool_events = block_producer
+            .subscribe_to_mempool_with_retry(chain_tip_header.block_num())
+            .await
+            .context("failed to subscribe to mempool events")?;
+
+        // Unlock the block-producer's block production. The block-producer is prevented from
+        // producing blocks until we have subscribed to mempool events.
+        //
+        // This is a temporary work-around until the ntx-builder can resync on the fly.
+        self.bp_checkpoint.wait().await;


A medium term goal is to decouple the block-producer and ntx builder. Currently they are coupled because the ntx builder must first sync to the latest committed state from the store and we need that to align with the block-producer. This is enforced by using the bp_checkpoint barrier.

I've been thinking about how we can decouple these so that the ntx builder can start up independently at any time as a separate component.

There are effectively two things that need to be sync'd:

The chain state (latest header & MMR) which is used to align the mempool subscription.

Existing network accounts loaded from store.

(1) is trivially solveable by simply repeating it in a loop until it succeeds i.e. (a) read state from store, (b) subscription, if desync'd then repeat.

(2) we can avoid entirely by creating a separate task who's goal it is to fetch network accounts from the store in chronological order until it reaches the chain tip. The task sends these as a new kind of event via a channel. In this way the main loop is eventually informed of all pre-existing accounts and it can decide to spin them up or not, or maybe they're already running.

Mirko-von-Leipzig · 2025-12-11T11:28:37Z

crates/ntx-builder/src/builder.rs

+            },
+            // Broadcast to all actors.
+            MempoolEvent::TransactionsReverted(txs) => {
+                self.coordinator.broadcast(event.clone()).await;


Ideally this wouldn't be a broadcast as this wakes up all sleeping actors.

We technically know which actors these transactions interacts with. We could figure that out and be more specific here.

Mirko-von-Leipzig · 2025-12-11T11:29:57Z

crates/ntx-builder/src/builder.rs

+        let pruned_block_height =
+            (chain_state.chain_mmr.chain_length().as_usize().saturating_sub(MAX_BLOCK_COUNT))
+                as u32;
+        chain_state.chain_mmr.prune_to(..pruned_block_height.into());


Remind me why chain state isn't part of self? Is it a borrowing issue?

Mirko-von-Leipzig · 2025-12-11T12:53:19Z

crates/ntx-builder/src/coordinator.rs

+            None => {
+                // There are no actors to wait for. Wait indefinitely until actors are spawned.
+                std::future::pending().await
+            },


Nit: since this is so different from the rest, consider moving it out using

let Some(result) = self.actor_join_set.join_next().await else { return std::future::pending().await };

Mirko-von-Leipzig · 2025-12-11T13:02:57Z

crates/ntx-builder/src/builder.rs

I've written about this a bit before #1314 (comment), but I'll repeat it here for visibility.

Some thoughts on how we can reduce the number of active account actors. This is largely orthogonal to whether we use an actor framework or just keep it simple with our current approach.

Some of these thoughts might be 💩

Self shut down

Let's allow accounts to shut themselves down. They can do this by simply exiting with return Ok(()). The coordinator then interprets this as a deliberate decision and does not restart the actor automatically.

Accounts could shut themselves down if they have no useful notes and haven't received an event in e.g. 5 minutes.

This technically introduces a race condition, where the actor shut downs and then an event arrives which should have kept it alive. In order to detect this, we could have the actor return the event receiver on exist i.e. return Ok(rx_events). The coordinator can then check if the receiver is empty and if it isn't, restart the actor which would mean it would reprocess things.

Blacklisting accounts

The co-ordinator can keep a hashmap of accounts that crash or return an error. Once the crash count reaches some limit they get added to a blacklist and are no longer considered. This should be fairly straight-forward to add.

Standardizing account startup

I think at the moment we have separate flows for starting an account from an event, or starting an account from the store. I think we can connect these by always first getting the account state from the store. If there is no account, then the actor waits until it receives a new account event. We can similarly use the timeout strategy here so that the actor shuts down if the new account event doesn't arrive within a few minutes after start up.

This means we have the same flow for all actors.

sergerad requested review from Mirko-von-Leipzig, bobbinth and drahnr and removed request for Mirko-von-Leipzig December 9, 2025 00:16

sergerad changed the title ~~feat: NTB Actor Refactor~~ feat: NTX Actor Refactor Dec 9, 2025

sergerad changed the title ~~feat: NTX Actor Refactor~~ feat: NTX Builder Actor Refactor Dec 9, 2025

Mirko-von-Leipzig approved these changes Dec 11, 2025

View reviewed changes

sergerad added 4 commits December 12, 2025 07:12

Ntx actor refactor

52a307b

RM unusued files

1f6331f

Move script cache to builder

3821e05

Fix storage slot compile

9acd64e

sergerad force-pushed the ntx-refactor-2 branch from 7a61e3f to 9acd64e Compare December 11, 2025 18:35

sergerad merged commit 0428fa9 into next Dec 11, 2025
6 checks passed

sergerad deleted the ntx-refactor-2 branch December 11, 2025 19:22

This was referenced Dec 12, 2025

Ntx Actor Followup #1448

Closed

NTX Actor Followup #1327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: NTX Builder Actor Refactor#1435

feat: NTX Builder Actor Refactor#1435
sergerad merged 4 commits intonextfrom
ntx-refactor-2

sergerad commented Dec 8, 2025 •

edited

Loading

Uh oh!

Mirko-von-Leipzig left a comment

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Mirko-von-Leipzig Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let mut interval = tokio::time::interval(self.ticker_interval);
		interval.set_missed_tick_behavior(time::MissedTickBehavior::Skip);

Conversation

sergerad commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Self shut down

Blacklisting accounts

Standardizing account startup

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sergerad commented Dec 8, 2025 •

edited

Loading