Make storage and mmr trait async, attempt 2. #121

dan-da · 2024-03-23T21:51:11Z

addresses #108, #75
obsoletes #120

This is a second pass at making the storage layer async in neptune-core.

note: I've added some background context here.

The first attempt in #120 is simpler and affects less code in neptune-core. Also, all of the existing tests compile and pass. But it had a serious inefficiency with ArchivalMmr, where sync and async worlds collided. I encourage reviewer(s) to read #120, including comments, and perhaps ArchivalMmr code for background first.

If some way could be found to decouple ArchivalMmr from the Mmr trait, that might allow us to use the simpler code in #120. However, I do not see any simple solution because a lot of code is generic over the Mmr trait. Perhaps a reviewer might?

This second attempt builds on #120 (strict superset) and makes the Mmr trait async (in neptune-core only) which resolves the ArchivalMmr issue, though at the cost of requiring much more code to become async, and requiring some tests and test-related infrastructure to be commented out. See notes below.

I believe this is the more correct, performant, and idiomatic implementation, and should be the one we move forward with, although it presents a few challenges.

This PR does still have one inefficiency in tasm::RemovalRecordsIntegrity::rust_shadow(), whereby it spawns an OS thread for each invocation. I am unsure how frequently it is called. Perhaps it can be modified not to call any async method, or refactored out. Reviewers, please have a look.

To facilitate reviewing, I have compiled a summary of changes:
async_storage_pr_notes.txt

If this PR is merged, we will want to find a solution for async proptests, which I have commented out for now. There is a new proptest_async crate, however it presently only works with async_std, not tokio. Last night I sent the author an offer to help add a tokio feature-flag, and have not yet heard back. Anyway, that seems a path forward.

The arbitrary crate does not provide an async trait, so I had to comment out impl Arbitrary in 4 cases. (These seem like they should be inside test module anyway.) I did a quick look for an async fork of arbitrary crate, but didn't find anything right away. I imagine/hope that some alternative exists in the async ecosystem.

This PR depends on specific revisions of twenty-first and tasm-lib in git. Those revisions now exist in master for each repo, so we need only make a new release for each, and also for triton-vm, which depends on twenty-first.

Testing

all tests and doctests are passing (for me, rust 1.75) except of course for the disabled proptests.
I need to update to latest rust, and fix some clippy errors that github is encountering.
I ran run-multiple-instances-from-genesis.sh and all 3 nodes start and init databases correctly, and are able to mine blocks. Some warnings and errors are encountered related to POW difficulty. I'm unsure if these are caused by the async changes (doubtful), or would happen on master also. (todo: verify)

Runtime warnings/errors encountered:

WARN ThreadId(05) neptune_core::models::blockchain::block: Block digest exceeds target difficulty
ERROR ThreadId(05) neptune_core::mine_loop: Own mined block did not have valid PoW Discarding.
Running consensus program with input: 10373279888499205127,5472989617886819963,225249538409389332,14351683435638599318,7410669378538831243,9174421884590172073,10850780262550797792,11552744393070458998,2538672691563304706,8964411581216926690,13736493314391979227,1283618371588003095,6648510168461056029,8044674294685951522,2194367602604877769

2024-03-23T23:09:10.792606151Z  WARN ThreadId(05) neptune_core::models::blockchain::block: Block digest exceeds target difficulty
2024-03-23T23:09:10.792639065Z  WARN ThreadId(05) neptune_core::peer_loop: Received invalid proof-of-work for block of height 1 from peer with IP [::ffff:127.0.0.1]:38738
2024-03-23T23:09:10.792664099Z  WARN ThreadId(05) neptune_core::peer_loop: Difficulty is 2.
2024-03-23T23:09:10.794105804Z  WARN ThreadId(05) neptune_core::peer_loop: Proof of work should be 09223372034707292160,09223372034707292160,09223372034707292160,09223372034707292160,09223372034707292160 (or more) but was [18297266674924710731, 01714964341069217700, 04137310288005091088, 14981086055731355199, 16824586181540981647].
2024-03-23T23:09:10.795479734Z  WARN ThreadId(05) neptune_core::peer_loop: Sanctioning peer ::ffff:127.0.0.1 for InvalidBlock((BlockHeight(BFieldElement(4294967295)), Digest([BFieldElement(539001808393609628), BFieldElement(6909788068636440929), BFieldElement(6493292427912708903), BFieldElement(18261214943504998237), BFieldElement(15830763525737866356)])))
2024-03-23T23:09:10.795678886Z  WARN ThreadId(05) neptune_core::peer_loop: Failed to validate block due to insufficient PoW. Closing connection.
...
2024-03-23T23:09:10.799150404Z ERROR ThreadId(04) neptune_core::main_loop: Got error: Failed to validate block due to insufficient PoW

Stack backtrace

2024-03-23T23:07:21.786028273Z  WARN ThreadId(02) neptune_core::mine_loop: Received block is timestamped in the future; mining on future-timestamped block.

This is an unmodified copy of twenty_first::storage at twenty-first revision 890c451e4e513018d8500bedcd5bf76dd0bafdd9 (master) It is not yet incorporated into the build.

fixes error: In order to be transferred, a Block must have a non-None proof field. ProofType enum enables specifying/transferring an unimplemented Proof. This is only temporary. BlockType enum enables specifying Genesis vs Standard block. A Standard block has a ProofType The Genesis block has no ProofType

Under crate::locks we previously had: - sync - tokio Each contains an impl of AtomicRw and AtomicMutex, with basically the same API. Yet: - sync refers to synchronous locks, ie sync vs async. - tokio refers to the tokio::sync lock implementation. So the names are referring to different things. Instead we change it to: - std - tokio Now each refers to a lock implementation, ie std::sync and tokio::sync. note: we could instead have changed `tokio` to `async`, but then there might be multiple async lock impls to choose from. So it seems cleanest to use the name of each impl.

Moved this benchmark over from twenty_first. Presently unable to build the twenty_first db_* bench tests because the storage layer is now async and the divan bench crate doesn't yet support async. However, it may soon, see: nvzqz/divan#39

Before this refactor: The Mmr trait has only sync methods and exists in twenty_first. The Mmr trait is implemented by ArchivalMmr and MmrAccumulator. ArchivalMmr uses database::storage, which is now fully async. MmrAccumulator uses only sync method. ArchivalMmr implements the sync Mmr trait methods by spawning a new OS thread and tokio runtime for each method call. This works but is inefficient. After this refactor: The Mmr trait has been copied into neptune_core, and the neptune_core version has only async methods. The Mmr trait is implemented by ArchivalMmr and MmrAccumulator. ArchivalMmr is fully async, and does not spawn any OS threads. MmrAccumulator has been copied into neptune_core and is fully async. Everything in neptune_core that uses MmrAccumulator is now async. certain tests and test-support scaffolding depend on tasm traits that are not async, and had to be commented out for now. proptests had to be commented because the the proptest crate does not yet support async tests. Refactor history: wip. add MmrAsync trait wip. attempting to make mmr trait async. very messy, build broken wip. Mmr trait is async. neptune-core builds without warnings. tests do not build wip. move mmr files from twenty_first into util_types/mmr wip. cleanup commented use statements, etc wip: cargo fmt wip. tests compile cleanly. (with proptests commented-out) wip: fix failing tests. all tests now pass, including doctests style: cargo fmt style: fix clippy warnings wip. cleanup commented code, add comments

dan-da · 2024-03-23T22:56:42Z

I notice that checks are failing due to clippy errors. I think this is because github is using latest rust toolchain, and I am still on 1.75. I will update, and fix any errors it reports.

dan-da · 2024-03-24T05:58:25Z

Some extra context may be helpful for reviewers, as a starting point.

With regards to the motivation for this async storage work, consider that neptune-core is architected to spawn tokio tasks for all activities such as the peer loop, mining loop, and main loop. We do not directly spawn OS threads anywhere [1]. The tokio docs and indeed most any async documentation makes it clear that concurrency will suffer if blocking calls, especially disk I/O are made from async environment. I found this comment insightful from someone who experienced exactly this type of problem with their redb storage layer:

I wanted to add some color here. ordinals.com performance has been horrible, and we finally figured out why.

The issue is that we're using an async web framework, so all of our endpoint functions are async. However, those functions then call into redb, which is not async. When those calls are slow, tokio kind of melts, since you have a bunch of async tasks which aren't yielding. If redb supported an async interface, then we could do everything in async, and it wouldn't be a problem.

However, the fix was very simple. We just used tokio::task::spawn_blocking inside of the async functions, and made calls to redb inside of the spawned threads, which are executed on a thread pool. This basically fixed everything, and even if redb had an async interface, we might not even use it, because async is relatively painful, and we would have to convert all of our synchronous index functions which access the database into async. So in our case, it still doesn't make sense, although it may make sense for other use-cases, or it might make sense for use if we run into scaling limits with threads.

I think that quote illustrates the matter better than I could.

Ok, so...

The groundwork for this PR began with NeptuneLevelDb, which was made async a while back. This was done by wrapping every levelDb method with tokio's spawn_blocking API. Example:

    /// Set database value asynchronously
    pub async fn put(&mut self, key: Key, value: Value) {
        let mut inner = self.0.clone();
        task::spawn_blocking(move || inner.put(key, value))
            .await
            .unwrap()
    }

The LevelDB API itself remains sync of course, however spawn_blocking() informs tokio that a blocking sync call is being made, so tokio can run that on a separate OS thread in its thread-pool, and other tasks in the current thread can continue being polled meanwhile.

So that's been working already for some DB accesses, but not for DbtVec, DbtSingleton, and RustyLevelDbVec, which are heavily used in neptune-core. Those types still used the direct LevelDb API from twenty-first.

So the foundational work of this PR was to modify the storage layer to use the async NeptuneLevelDb APIs. This required replacing iterators with async streams, and some other changes. Once that was done, all the rest was largely a mechanical exercise of fixing everything that broke as a result.

The primary challenges were due to trait entanglement from crates twenty-first, tasm-lib, and even arbitrary. I've documented that elsewhere, so won't go into detail here.

[1] if we had a separate OS thread for each peer, mining, triton-vm, etc, we could perhaps get away with blocking I/O, as such tasks would not block eachother.

aszepieniec · 2024-03-24T10:39:22Z

This PR does still have one inefficiency in tasm::RemovalRecordsIntegrity::rust_shadow(), whereby it spawns an OS thread for each invocation. I am unsure how frequently it is called. Perhaps it can be modified not to call any async method, or refactored out. Reviewers, please have a look.

It looks like this function needs to be async because the trait functions Mmr::bag_peaks and Mmr::count_leaves are async, and these are called on objects swbf_inactive and aocl, which are MMR accumulators and not archival MMRs.

Was it the intention to make member functions of accumulator objects async, or was this an unintended consequence of making the trait Mmr async? If the latter, then I suggest killing the trait.

dan-da · 2024-03-24T21:57:23Z

Was it the intention to make member functions of accumulator objects async, or was this an unintended consequence of making the trait Mmr async?

That is the primary difference between PR #120 and this one. We have both paths available to choose from.

In 120, the Mmr trait is sync, and ArchivalMmr implements the sync methods by spawning OS threads to call async methods for storage. MmrAccumulator and a bunch of things that use it, eg MutatorSetKernel remain sync.

In this one, the Mmr trait is made async, which forces MmrAccumulator and everything that uses it to also become async.

One additional consideration: I imagine that triton-vm execution of some scripts will be CPU intensive and comparitively long running. tokio docs make it clear that CPU-heavy (long running) function calls harm concurrency, just as blocking I/O does, and should be wrapped with spawn_blocking() if they can't be made async directly. I have not performed any analysis in this area yet, but it may turn out that there is benefit to having MmrAccumulator, MutatorSetKernel and related areas of neptune-core be async. So anyway, that's been at the back of my mind while working on this stuff.

If the latter, then I suggest killing the trait.

Yeah, so that would be nice if we can make it work without duplicating a mountain of code. In fact I suggested the possibility above:

If some way could be found to decouple ArchivalMmr from the Mmr trait, that might allow us to use the simpler code in #120. However, I do not see any simple solution because a lot of code is generic over the Mmr trait. Perhaps a reviewer might?

Indeed I already attempted this briefly, and quickly found it was complex. I discussed it here.

Expanding on that explanation, I believe this is a complete list of places that require the Mmr trait:

$ find src -type f | xargs grep " Mmr<" | grep -v _mmr.rs | grep -v tor.rs | grep -v traits.rs
src/util_types/test_shared/mutator_set.rs:pub async fn insert_mock_item<M: Mmr<Hash>>(
src/util_types/test_shared/mutator_set.rs:pub async fn remove_mock_item<M: Mmr<Hash>>(
src/util_types/mutator_set/ms_membership_proof.rs:    pub async fn batch_update_from_addition<MMR: Mmr<Hash>>(
src/util_types/mutator_set/removal_record.rs:    pub async fn batch_update_from_addition<MMR: Mmr<Hash>>(
src/util_types/mutator_set/removal_record.rs:        M: Mmr<Hash>,
src/util_types/mutator_set/mutator_set_kernel.rs:pub struct MutatorSetKernel<MMR: Mmr<Hash>> {
src/util_types/mutator_set/mutator_set_kernel.rs:impl<M: Mmr<Hash>> MutatorSetKernel<M> {
src/util_types/mutator_set/mutator_set_kernel.rs:impl<MMR: Mmr<Hash> + BFieldCodec> BFieldCodec for MutatorSetKernel<MMR> {

Each of those would need to be re-written to work for both sync and async scenarios. But then anything that calls them with an ArchivalMmr would need to be async anyway, which appears to be a lot of code. So in the end, much of it needs to become async anyway, and we might end up with something not too different from this PR.

at least that's how it appears to me, but.....

@aszepieniec you understand this area of the code much better than I do. So if you can see a clear path to getting rid of the Mmr trait, then perhaps you could start a branch based off make_storage_async (from 120) for us to collaborate on. If you could set the direction and perhaps get the thing to build without tests, I'd be happy to finish up whatever needs doing and get tests building, etc. So that might be an option.

or anyway, please let me know your thoughts.

aszepieniec · 2024-03-25T07:53:23Z

Thanks for clarifying. There's a lot to think about and most of the time I'm silent it's because I don't have a qualified opinion.

So if you can see a clear path to getting rid of the Mmr trait, then perhaps you could start a branch based off make_storage_async (from 120) for us to collaborate on.

I will consider that, but you should know that there are also other things vying for my attention.

In regards to the Mmr trait, I think it came about by accident. It may have benefited the process of writing the code in the first place, but I doubt that there is a code-architecture reason to keep it. That said, maybe getting rid of it is more hassle than it's worth.

As for the complexity of operations: I did not run any benchmarks but raison d'être for the accumulator variants is their light footprint. Phrased differently, if making them async can be justified on the basis of their long-running CPU profile, then we are doing something very wrong and should reconsider the architecture.

dan-da · 2024-04-02T19:58:03Z

Just a note that the author of proptest_async crate notified me that he has added tokio support. We don't seem to need async proptests right now, as we are moving forward with #124 instead of this PR, however its good to know it exists for possible future tests.

If this PR is merged, we will want to find a solution for async proptests, which I have commented out for now. There is a new proptest_async crate, however it presently only works with async_std, not tokio. Last night I sent the author an offer to help add a tokio feature-flag, and have not yet heard back. Anyway, that seems a path forward.

dan-da · 2024-04-02T19:58:18Z

closing in favor of #124.

dan-da added 17 commits March 17, 2024 10:16

chore: moving storage from twenty-first

8e4f3a6

This is an unmodified copy of twenty_first::storage at twenty-first revision 890c451e4e513018d8500bedcd5bf76dd0bafdd9 (master) It is not yet incorporated into the build.

wip: make storage layer async. passes all tests except doctests

37247a3

wip: remove util_types/sync

87a630b

wip. storage cleanups, and impl DbtVec stream tests

081048f

wip. fix storage_vec module tests, cleanup

a368996

wip. cargo fmt

0e61893

wip. misc cleanups. no more warnings

6b16bb7

wip. move mock mmr test code into test module

1c92969

wip. remove dep on twenty_first::mock

5579e3a

wip. lint

88a6814

wip. update to latest lib-tasm

d19dd82

doc: fix doctests. all tests now passing

70581d5

doc: update DbSchema docs, remove unused files

5429901

perf: add sync_atomic bench test

35af634

Moved this benchmark over from twenty_first. Presently unable to build the twenty_first db_* bench tests because the storage layer is now async and the divan bench crate doesn't yet support async. However, it may soon, see: nvzqz/divan#39

dan-da requested review from aszepieniec and Sword-Smith March 23, 2024 21:51

dan-da changed the title ~~Make storage and mmr trait async squashed~~ Make storage and mmr trait async, attempt 2. Mar 23, 2024

dan-da mentioned this pull request Mar 23, 2024

Make storage async #120

Closed

6 tasks

dan-da mentioned this pull request Apr 2, 2024

make storage async, attempt 3. #124

Merged

dan-da closed this Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make storage and mmr trait async, attempt 2. #121

Make storage and mmr trait async, attempt 2. #121

dan-da commented Mar 23, 2024 •

edited

Loading

dan-da commented Mar 23, 2024

dan-da commented Mar 24, 2024 •

edited

Loading

aszepieniec commented Mar 24, 2024

dan-da commented Mar 24, 2024 •

edited

Loading

aszepieniec commented Mar 25, 2024

dan-da commented Apr 2, 2024

dan-da commented Apr 2, 2024

Make storage and mmr trait async, attempt 2. #121

Make storage and mmr trait async, attempt 2. #121

Conversation

dan-da commented Mar 23, 2024 • edited Loading

Testing

dan-da commented Mar 23, 2024

dan-da commented Mar 24, 2024 • edited Loading

aszepieniec commented Mar 24, 2024

dan-da commented Mar 24, 2024 • edited Loading

aszepieniec commented Mar 25, 2024

dan-da commented Apr 2, 2024

dan-da commented Apr 2, 2024

dan-da commented Mar 23, 2024 •

edited

Loading

dan-da commented Mar 24, 2024 •

edited

Loading

dan-da commented Mar 24, 2024 •

edited

Loading