-
Notifications
You must be signed in to change notification settings - Fork 33
P2P: Avoid thread hop on incoming blocks #1223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
plugins/net_plugin/net_plugin.cpp
Outdated
| }); | ||
| return; | ||
| // This mutex is not about thread-safety; the call to accept_block is thread-safe. | ||
| // This mutex is to avoid extra work of validating the same block header on multiple threads at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like there ought to be something even more lightweight than a mutex given how this comment describes the problem being solved. But I'm not immediately thinking of anything with certainty given unfamiliarity here.. Maybe an atomic that simply indicates the current block id being processed: if the atomic matches then you know this block is already being processed on another thread and can bail in any effort processing it on this thread.
Of course the problem is that we only have 16 byte lockfree atomics and block id is 32 bytes. But matching against the first or last 16 of 32 seems fine enough in this use case.
(not a suggestion for a change just thinking about it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose a single atomic breaks down once you have 3 threads in here at once since if 2 of those threads are the same id then you would allow the unneeded accept_block. Doesn't break anything just nerfs the purpose. Of course if 3+ threads are often going to be contesting this mutex it's effectively a bunch of thread start/stops anyways which might nullify any gains you get from trying to avoid the duplicate work depending on how long the work takes.
Almost like you need a lock free unordered hash here.
Or you could even use a small vector of atomics; e.g. 3 of them will ensure if 4 threads simultaneously are here everything is still fully guarded.
plugins/net_plugin/net_plugin.cpp
Outdated
| // This mutex is not about thread-safety; the call to accept_block is thread-safe. | ||
| // This mutex is to avoid extra work of validating the same block header on multiple threads at the same time. | ||
| // The mutex is used instead of posting to the dispatcher strand as the post to the dispatcher strand adds | ||
| // a thread switch which adds a delay in processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's going to be true for post() but quite possibly dispatch() solves that problem? Both the dispatcher strand and connection strand run on the same thread pool (same io_context). So I would have expected "dispatcher_strand.dispatch()" to immediately call the callback without a thread hop (unless the dispatcher strand is already executing something else at the moment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It looks like dispatch() does solve the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in many ways what you've done with the mutex is kinda sorta what the dispatch ultimately does for you. It LGTM as is if you like it this way. There is a small possibility the dispatch will be more efficient than the mutex when the mutex is contested because a contested mutex generally implies a thread stop/start, but this stop/start may not be the case for when dispatch posts because whatever thread the strand is currently running on could pick up the post without a start/stop of a thread since that thread is already running (all this would depend on asio's impl that I'm not sure about so this could be an optimistic take).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my bit of testing it does seem like the dispatch() is better (faster) than the mutex for the nominal case. Plus it is simpler than adding a mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right that this PR has two unrelated changes:
- The switch from
post()todispatch(), which avoids a thread hop - All the rest of the PR, which implements a better way to avoid validating the same block twice (although I think it could still happen, since two threads can check in
create_block_handlethatidis not infork_db, then proceed to validateidbefore adding it tofork_db)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original dispatcher post() was to avoid two block header validations at the same time. This PR is an alternative, more efficient mechanism to achieve the same thing.
It is possible via the create_block_handle to have more than one thread validating the same block header. That is why the net_plugin serializes the calls to create_block_handle to avoid that. In practice, it is only called from net_plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. Maybe this serialization could be avoided by having the fork_db keep an internal unordered_map of blocks that are currently being validated. So the call:
if (auto bsp = fork_db.get_block(id, include_root_t::yes))
could be replace by something like:
if (auto bsp = fork_db.start_validating_block(id, include_root_t::yes)) which would return a null pointer only if this block has not been started validating before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work but seems rather complicated compared to current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also allow to validate blocks (including signatures) in parallel, although we'd still have to add them to fork_db in order. Not sure if that would increase syncing performance or not though.
|
Note:start |
Optimize processing of incoming blocks by removing an unneeded thread hop on nominal case.