[1.1.5] fix state history race on disconnect: drain session strand before destroying session #1446
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A state_history session runs two corountines: one to read from the websocket and one to write to the websocket. A failure on either corountine will cause it to complete as well as indicate to the other countinue that it should complete as well. Completion of either corountine ends up here,
spring/plugins/state_history_plugin/include/eosio/state_history_plugin/session.hpp
Lines 73 to 81 in f7657a2
Since the timing of those completions can be indeterminate, a simple counter is used to detect that both have completed. When that occurs
sessionindicates externally that it is done and it's ready to be destroyed. "done" means thesessionis no longer using any of its members such as thestream, etc -- sounds safe to destroy right? This done indication winds up here,spring/plugins/state_history_plugin/state_history_plugin.cpp
Lines 118 to 121 in f7657a2
Where a small piece of work is submitted to the main thread to remove the session from
connections.While this erasure is queued up on the main thread, the main thread may be doing other work that results in the
on_accepted_block()signal firing. This will cause state_history to,spring/plugins/state_history_plugin/state_history_plugin.cpp
Lines 161 to 163 in f7657a2
Remember the erasure is still queued up on the main thread, so the "done" session is still in this list.
block_applied()is very simple and callsawake_if_idle()spring/plugins/state_history_plugin/include/eosio/state_history_plugin/session.hpp
Lines 49 to 54 in f7657a2
spring/plugins/state_history_plugin/include/eosio/state_history_plugin/session.hpp
Lines 67 to 71 in f7657a2
Critically,
awake_if_idle()post()s to the "done" session's strand.Now 1 of 3 things can happen based on timing:
wake_timer.cancel_one()posted to the "done" session's strand completes prior to the erasure on main thread. This is fine.wake_timer.cancel_one(). This is also fine -- the strand is destroyed and any pending work is dropped.wake_timer.cancel_one()(or also during any phase where the strand's internal code is otherwise running). This will result in undefined behavior as thesessionand its members (including thestrand) will be in various stages of destruction.To resolve this problem, can drain any potential pending queued
wake_timer.cancel_one()prior to destruction. Since we're blocking the main thread during draining, we can be sure no new calls toblock_applied()occur.The premise above points to the same line that #1419 fails on, so will mark this resolving it.