Skip to content

feat: make transaction execution asynchronous#1636

Closed
PhilippGackstatter wants to merge 29 commits intonextfrom
pgackst-async-experimental
Closed

feat: make transaction execution asynchronous#1636
PhilippGackstatter wants to merge 29 commits intonextfrom
pgackst-async-experimental

Conversation

@PhilippGackstatter
Copy link
Contributor

Experiment built on top of miden-vm's async-experimental branch.

required for #1162

@PhilippGackstatter
Copy link
Contributor Author

PhilippGackstatter commented Jul 28, 2025

I've updated the code to incorporate 0xMiden/miden-vm#2026, thanks @plafer!

Some transactions are executing fine now, but there are two things that I noticed:

  • The FastProcessor seems to expect the StackInputs to be the reverse of what the slow processor expects, which is confusing. I've worked around that by reversing the inputs, but this seems worth making consistent. This is not a critical issue and can be fixed later, if it is not a deliberate deviation.
  • The other issue is that all the tests that do fail now are failing for what seems like a variant of the same issue:
Error: failed to execute transaction kernel program:
  x failed to execute the program for internal reason: stack underflow when restoring context

or a slightly different message:

Error:   x failed to execute code
  `-> failed to execute transaction kernel program:
        x failed to execute the program for internal reason: stack underflow

Since these tests were executing fine on the slow processor, my current guess is that this is a bug in the fast processor, but that could be wrong.
@plafer Could you take a look at this? I haven't been able to make a minimal, reproducible example yet. For now, you can run make test on this PR. This issue is holding this PR up, so would be great to get fixed as soon as possible.

@plafer
Copy link
Collaborator

plafer commented Jul 28, 2025

The FastProcessor seems to expect the StackInputs to be the reverse of what the slow processor expects, which is confusing. I've worked around that by reversing the inputs, but this seems worth making consistent.

This is somewhat discussed in 0xMiden/miden-vm#2021. Although note that in 1-2 releases Process should go away completely, so there will be just the FastProcessor (and hence no "consistency" to have).

Could you take a look at this? I haven't been able to make a minimal, reproducible example yet. For now, you can run make test on this PR. This issue is holding this PR up, so would be great to get fixed as soon as possible.

Hmmm, this is due to whatever code you're running doing a lot of "superfluous" drops. Basically the fast processor works with a fixed buffer for all stacks across all contexts, and we fix the size of that buffer to a fixed value. This was found to be vastly more performant than using a Vec for the stack. We also haven't implemented any strategy for what to do when we run out of space - we made it large enough so that all our tests pass (including some recursive verifier ones that were pretty heavy on the stack).

However you're hitting underflow - which as mentioned above suggests that you drop values when the stack is already [ZERO; 16]. We currently set this value to 50, such that you can drop 34 "non-existent elements" without triggering underflow. Now, we totally need to fix that in the FastProcessor - but really we assumed we'd be able to run "non-degenerate programs" just fine, and only find a good solution later (which might be a bit complex if we don't want to lose performance). As indicated by this comment, we predicted programs that hit underflow to most likely contain bugs (such that they think they're computing on real values, but really they're computing on an empty stack).

Would you be able to identify spots which drop values that aren't there? Or anything equivalent, such as running add on a [ZERO; 16] stack. If this ends up being difficult, I can push a hotfix, probably that just increases the amount of underflow that we tolerate.

@bobbinth
Copy link
Contributor

Would you be able to identify spots which drop values that aren't there? Or anything equivalent, such as running add on a [ZERO; 16] stack. If this ends up being difficult, I can push a hotfix, probably that just increases the amount of underflow that we tolerate.

I would probably bump up the "undeflow space" in the async-experimental branch to see if we can get this branch to run and then we can come up with a longer term strategy. Maybe let's go with 200 instead of 50?

@plafer
Copy link
Collaborator

plafer commented Jul 28, 2025

@PhilippGackstatter with 0xMiden/miden-vm#2035 hopefully this resolves this issue - although it would be good for someone to look into why you were hitting the underflow in the first place

@PhilippGackstatter
Copy link
Contributor Author

although it would be good for someone to look into why you were hitting the underflow in the first place

Thank you! It resolved all stack underflow issues for now. Thanks for explaining the issue. I think we're not following our call convention in many tests. So often we are not padding the stack before a call but still drop 16 elements afterward, and I assume that is what leads to many of the tests underflowing the stack as you described. I agree we'll have to look into that more properly eventually.

CI is green now and so @drahnr if you want, you can continue with this one. Let me know if I can help in any way!

@bobbinth
Copy link
Contributor

CI is green now and so @drahnr if you want, you can continue with this one. Let me know if I can help in any way!

I think the next steps here are:

  1. Create a PR in miden-vm which introduces conditional Sync bounds for the feature returned from AsyncHost::on_event(). In the same PR, we can change AsyncHost::get_mast_forest() to a regular method to simplify the current task.
  2. Merge the above PR into async-experimental.
  3. Update this PR as needed.
  4. Try to propagate the changes to miden-node and miden-client to see if everything works as expected.

Right?

@bobbinth
Copy link
Contributor

Conditional compilation is now in async-experimental branch in miden-vm. @PhilippGackstatter could you try to integrate it here and see if everything works? (also, would be good to merge the latest next into this branch).

Once this is done:

  • @drahnr could you check if we can make the node work off of this branch?
  • @igamigo could you check if we can make the client work off of this branch?

@drahnr
Copy link
Contributor

drahnr commented Jul 30, 2025

I rebased my changes on Friday (or Thursday) onto next and then rebased it all on-top of @PhilippGackstatter 's branch from Wednesday last week, the state is now bernhard-async-host-trait-bounds. I don't think it's useful, just for sake of completeness.

@PhilippGackstatter
Copy link
Contributor Author

@igamigo I think this PR is in a state where you can try it in the client. I don't know of a good way to test whether the current state works for Wasm or not, other than creating an entire Wasm project for this (but maybe we should do this at least for this PR).

@drahnr
Copy link
Contributor

drahnr commented Jul 30, 2025

works (make build && make test passes), see bernhard-async-experimental / e7ed34b9d5d6f106881630c9895fe40695fad26d

@bobbinth
Copy link
Contributor

works (make build && make test passes), see bernhard-async-experimental / e7ed34b9d5d6f106881630c9895fe40695fad26d

I think one of the goals was to get rid of spinning up a separate tokio runtime for each transaction execution in the NTB (@Mirko-von-Leipzig probably remembers where this code is exactly) - are we able to address this with this branch?

@Mirko-von-Leipzig
Copy link
Contributor

works (make build && make test passes), see bernhard-async-experimental / e7ed34b9d5d6f106881630c9895fe40695fad26d

I think one of the goals was to get rid of spinning up a separate tokio runtime for each transaction execution in the NTB (@Mirko-von-Leipzig probably remembers where this code is exactly) - are we able to address this with this branch?

It's here

@igamigo
Copy link
Collaborator

igamigo commented Jul 30, 2025

Conditional compilation is now in async-experimental branch in miden-vm. @PhilippGackstatter could you try to integrate it here and see if everything works? (also, would be good to merge the latest next into this branch).

Once this is done:

  • @drahnr could you check if we can make the node work off of this branch?
  • @igamigo could you check if we can make the client work off of this branch?

Seems to work fine on the client as well! Draft here: 0xMiden/miden-client#1104. Left some notes with a few things I encountered, but nothing major.

@igamigo I think this PR is in a state where you can try it in the client. I don't know of a good way to test whether the current state works for Wasm or not, other than creating an entire Wasm project for this (but maybe we should do this at least for this PR).

My idea was to get rid of the workarounds around the web SDK authentication methods (aka, try to get data from indexedDB directly on the get_signature() call) and make sure web client integration tests pass correctly (which they do - disregard unit tests). I think this should be enough but let me know if you had anything else in mind.

@bobbinth
Copy link
Contributor

My idea was to get rid of the workarounds around the web SDK authentication methods (aka, try to get data from indexedDB directly on the get_signature() call) and make sure web client integration tests pass correctly (which they do - disregard unit tests). I think this should be enough but let me know if you had anything else in mind.

Yeah, if we can make async calls to indexdDB from within the authenticator, that should be sufficient PoC. Another potential thing to check is whether we can make requests to a service worker (i believe). I think this is how @evanmarshall and @dagarcia7 were planning to implement signature request handling.

@evanmarshall
Copy link
Contributor

I think this is how @evanmarshall and @dagarcia7 were planning to implement signature request handling.

Yes, that's exactly what we need.

@igamigo
Copy link
Collaborator

igamigo commented Jul 31, 2025

Another potential thing to check is whether we can make requests to a service worker (i believe).

Not entirely sure what this flow should look like, is the idea that we can request signatures to the worker which in turn should read from IndexedDB? In this case wouldn't this have been doable before as well?

@bobbinth
Copy link
Contributor

Not entirely sure what this flow should look like, is the idea that we can request signatures to the worker which in turn should read from IndexedDB? In this case wouldn't this have been doable before as well?

My understanding is that the service worker would keep the private keys in its internal memory and so they may not be in the indexdDB. But either way, communication with the service worker is also asynchronous (similar to indexdDB) and so we couldn't do it before. @evanmarshall can probably correct me if I'm wrong on any of the above points.

@drahnr
Copy link
Contributor

drahnr commented Jul 31, 2025

works (make build && make test passes), see bernhard-async-experimental / e7ed34b9d5d6f106881630c9895fe40695fad26d

I think one of the goals was to get rid of spinning up a separate tokio runtime for each transaction execution in the NTB (@Mirko-von-Leipzig probably remembers where this code is exactly) - are we able to address this with this branch?

This still doesn't work due to SourceManager bounds being too weak still, so this requires more work. Partially already done in https://github.com/0xMiden/miden-base/pull/1637/files#diff-8ca9dd3a358e61060f04351b3eef4911473a12dc1cc9d32075afef671de2f928R52

@evanmarshall
Copy link
Contributor

My understanding is that the service worker would keep the private keys in its internal memory and so they may not be in the indexdDB

This is correct. For support, we'll need to expose signatures in the client so the vault can use them directly: 0xMiden/miden-client#1107 and we'll be working on an issue that enables passing the callback through the web client to the transaction authenticator.

We'll also need to support both use cases where someone is using IndexedDB to store the keys because this makes sense to have in the default web client, but in the case of the wallet, we do not want the secret keys stored in IndexedDB.

@PhilippGackstatter
Copy link
Contributor Author

Just to note these down. Once we merge this PR, I think there are some follow-ups:

  • Remove TransactionContext::execute_blocking in favor of making tests async, e.g. use tokio::test macro and then use TransactionContext::execute.
  • Improve transaction event handling. E.g. refactor TransactionEvent to contain all necessary data to handle each event, see feat: make transaction execution asynchronous #1636 (comment). I think this should result in a lot cleaner implementation.
  • Refactor the source manager to remove it where we don't need it and where we do, move it into the host. @drahnr may have a clearer picture on this.

@bobbinth
Copy link
Contributor

bobbinth commented Aug 7, 2025

miden-vm crates v0.17.0 with the changes needed for this PR are now on crates.io.

Let's update this PR to use v0.17.0 VM crates directly and also bring it up to date with next.

@bobbinth
Copy link
Contributor

bobbinth commented Aug 7, 2025

Closing this in favor of #1705.

@bobbinth bobbinth closed this Aug 7, 2025
@PhilippGackstatter PhilippGackstatter deleted the pgackst-async-experimental branch August 18, 2025 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants