Skip to content

Conversation

@heifner
Copy link
Contributor

@heifner heifner commented Feb 28, 2025

Optimize processing of incoming blocks by removing an unneeded thread hop on nominal case.

@heifner heifner added this to the Spring v1.2.0-rc1 milestone Feb 28, 2025
@heifner heifner added the OCI Work exclusive to OCI team label Feb 28, 2025
});
return;
// This mutex is not about thread-safety; the call to accept_block is thread-safe.
// This mutex is to avoid extra work of validating the same block header on multiple threads at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like there ought to be something even more lightweight than a mutex given how this comment describes the problem being solved. But I'm not immediately thinking of anything with certainty given unfamiliarity here.. Maybe an atomic that simply indicates the current block id being processed: if the atomic matches then you know this block is already being processed on another thread and can bail in any effort processing it on this thread.

Of course the problem is that we only have 16 byte lockfree atomics and block id is 32 bytes. But matching against the first or last 16 of 32 seems fine enough in this use case.

(not a suggestion for a change just thinking about it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose a single atomic breaks down once you have 3 threads in here at once since if 2 of those threads are the same id then you would allow the unneeded accept_block. Doesn't break anything just nerfs the purpose. Of course if 3+ threads are often going to be contesting this mutex it's effectively a bunch of thread start/stops anyways which might nullify any gains you get from trying to avoid the duplicate work depending on how long the work takes.

Almost like you need a lock free unordered hash here.

Or you could even use a small vector of atomics; e.g. 3 of them will ensure if 4 threads simultaneously are here everything is still fully guarded.

// This mutex is not about thread-safety; the call to accept_block is thread-safe.
// This mutex is to avoid extra work of validating the same block header on multiple threads at the same time.
// The mutex is used instead of posting to the dispatcher strand as the post to the dispatcher strand adds
// a thread switch which adds a delay in processing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's going to be true for post() but quite possibly dispatch() solves that problem? Both the dispatcher strand and connection strand run on the same thread pool (same io_context). So I would have expected "dispatcher_strand.dispatch()" to immediately call the callback without a thread hop (unless the dispatcher strand is already executing something else at the moment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It looks like dispatch() does solve the problem.

Copy link
Contributor

@spoonincode spoonincode Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in many ways what you've done with the mutex is kinda sorta what the dispatch ultimately does for you. It LGTM as is if you like it this way. There is a small possibility the dispatch will be more efficient than the mutex when the mutex is contested because a contested mutex generally implies a thread stop/start, but this stop/start may not be the case for when dispatch posts because whatever thread the strand is currently running on could pick up the post without a start/stop of a thread since that thread is already running (all this would depend on asio's impl that I'm not sure about so this could be an optimistic take).

Copy link
Contributor Author

@heifner heifner Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my bit of testing it does seem like the dispatch() is better (faster) than the mutex for the nominal case. Plus it is simpler than adding a mutex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that this PR has two unrelated changes:

  1. The switch from post() to dispatch(), which avoids a thread hop
  2. All the rest of the PR, which implements a better way to avoid validating the same block twice (although I think it could still happen, since two threads can check in create_block_handle that id is not in fork_db, then proceed to validate id before adding it to fork_db)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original dispatcher post() was to avoid two block header validations at the same time. This PR is an alternative, more efficient mechanism to achieve the same thing.

It is possible via the create_block_handle to have more than one thread validating the same block header. That is why the net_plugin serializes the calls to create_block_handle to avoid that. In practice, it is only called from net_plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks. Maybe this serialization could be avoided by having the fork_db keep an internal unordered_map of blocks that are currently being validated. So the call:

if (auto bsp = fork_db.get_block(id, include_root_t::yes))

could be replace by something like:

if (auto bsp = fork_db.start_validating_block(id, include_root_t::yes)) which would return a null pointer only if this block has not been started validating before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work but seems rather complicated compared to current implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also allow to validate blocks (including signatures) in parallel, although we'd still have to add them to fork_db in order. Not sure if that would increase syncing performance or not though.

@ericpassmore
Copy link
Contributor

Note:start
category: Other
component: P2P
summary: Optimize processing of incoming blocks by removing an unneeded thread hop on nominal case.
Note:end

@heifner heifner merged commit 1bf5078 into main Mar 4, 2025
36 checks passed
@heifner heifner deleted the p2p-opt-block-processing branch March 4, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCI Work exclusive to OCI team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants