Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Make block proposing remaining duration configurable #4215

Merged
merged 3 commits into from
Nov 27, 2019
Merged

Conversation

sorpaas
Copy link
Member

@sorpaas sorpaas commented Nov 26, 2019

This tries to address the issue that proposers giving up proposing blocks due to long proposing time, causing the network not producing blocks for a long time.

remaining_duration, which is the parameter used to determine how long the proposing logic tolerant execution, is now splited into two -- slot_remaining_duration, for inner proposing logic to determine pushing extrinsic, and proposing_remaining_duration, for outer on_slot to determine the actual timeout. The timeout is now configurable by each consensus engine. For example, an engine can change this to be determined by last proposing slot actual_slot - expected_slot > expected_slot - parent_slot.

When review, please note that the behavior is changed that on_slot will not time out on proposing by default. I talked with @rphmeier and we think that's fine, but please let me know if you think this will create issues!

@sorpaas sorpaas added the A0-please_review Pull request needs code review. label Nov 26, 2019
@sorpaas sorpaas requested a review from rphmeier November 26, 2019 19:09

let block_import_params_maker = self.block_import_params();
let block_import = self.block_import();
let logging_target = self.logging_target();

Box::pin(proposal_work.map_ok(move |(block, claim)| {
// minor hack since we don't have access to the timestamp
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the proposing is now offloaded to a dedicated threadpool, the timeout above by future::select(Delay) is now reliable, and this "minor hack" can be removed.

@@ -71,6 +71,8 @@ pub fn time_until_next(now: Duration, slot_duration: u64) -> Duration {
pub struct SlotInfo {
/// The slot number.
pub number: u64,
/// The last slot number produced.
pub last_number: u64,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter not currently used, but is added so that engines can implement logic on proposing_remaining_duration based on parent/last proposing slot.

Copy link
Contributor

@rphmeier rphmeier Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does produced mean here?

What we would probably want to compare against is the slot of the parent header, not the last slot (which would be number - 1) or the last slot we were assigned to (would not vary as a function of slots-since-parent)

Copy link
Member Author

@sorpaas sorpaas Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphmeier It should be the slot when we last tried to propose blocks, so it's not necessarily number - 1 if previous proposing took too long. I think this parameter would be useful if we just want to calculate the remaining_duration based on past rough average proposing time?

I added reference of chain head header to proposing_remaining_duration. In this way, if the engine wants to calculate the duration based on parent slot, they can do it by parsing the header digest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the best we could hope for. Thanks

Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor comment


/// Remaining duration for proposing. None means unlimited.
fn proposing_remaining_duration(&self, _slot_info: &SlotInfo) -> Option<Duration> {
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not default to Some(slot_remaining_duration)? I think it's safer to make sure inner proposal logic is always timed out externally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I changed it to that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I want to note that if we really get a faulty proposer that stuck in an infinite loop, this won't really save us. We currently don't kill offloaded proposing logic. The threadpool has a few spots, if that fills up, no new proposing thread will be created, meaning the offloading future will just always return pending.

I don't think we need to deal with that extreme situation. If we do, the only (?) solution would probably be to kill those offloaded thread. I currently think that will be a bad idea, as it may poison locks and damage internal state.

@rphmeier rphmeier added A8-looksgood and removed A0-please_review Pull request needs code review. labels Nov 27, 2019
@sorpaas sorpaas merged commit ed61aa6 into master Nov 27, 2019
@sorpaas sorpaas deleted the sp-remain branch November 27, 2019 16:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants