Increase backing and approval timeouts for async backing #1366

Sophia-Gold · 2023-09-02T15:33:25Z

Since async backing increases execution time for collators from 500ms to 2s, we should also increase the execution time for backers and approval checkers beyond 2s to avoid larger parachain blocks not being backed due to small resource differences, nondeterminism, or cache misses.

The question is by how much. It's unclear whether we need to retain the 4x difference as 8s is quite long. This warrants testing.

burdges · 2023-09-02T21:15:05Z

We'd ideally pseudo-remove the gap between backer timeouts and approval checker timesout once we charge backers fr time overruns: #742

In practice there is inherently some gap where the backer earns something for oversized blocks, but in expectation they risk only a minor fee for the time overrun. We discussed the fee curve shape at the team retreat. I'd suggested time overrun fees should be curve fit to the costs of no shows, but you maybe had something more nuanced, or just different. I've forgotten now, maybe in paritytech/polkadot#7239

At present I've no thoughts on doing something different short-term, but yeah maybe a hacky setting makes sense if we want to deploy something faster.

rphmeier · 2023-09-02T22:02:34Z

In my opinion we should start from the amount of work backers should do and work backwards to what collators should use as a timeout.

There's no reason to maintain the 4x difference in any case, as the 0.5s timeout on collators pre-async-backing is primarily determined by the relay chain block time and p2p network latency, not execution time.

burdges · 2023-09-03T07:27:44Z

I doubt "timeout" is even well defined for collators. An optimized collator would often run quite different code from validiator, but optimized collators become a parachain logic specific problem.

We're worried about the gap between backers vs approval checkers here. Alfonso proposed a 6x gap in paritytech/polkadot#1656 We never used his 6x but we ended up with at least the 4x gap @Sophia-Gold mentions. In fact I thought we're currently running some larger gap, like 6x or 8x.

We'll pseduo-remove the gap in #742 like I say, but tight now the question is: Can we deploy async backing without #742 but with a smaller gap?

I've long forgotten how the developers selected such a large gap, so we'll just need to ask whoever put the gap into the code (maybe you) where it came from, and try to reason from that point.

There's no reason to maintain the 4x difference in any case, as the 0.5s timeout on collators pre-async-backing is primarily determined by the relay chain block time and p2p network latency, not execution time.

Is this really the only reason we have the 4x gap? If we only have a 1x gap then we should often trigger validity disputes due to timeouts, under the current scheme. If we've high variance in validator hardware then even a 2x gap could be insufficient, even assuming 100% honesty, although maybe single core performance in not such high variance anymore. Did we not just have testnet failures which required a larger gap?

rphmeier · 2023-09-03T23:14:08Z

We should really separate these concerns and be clear about what we're discussing

how much different does the backing timeout vs. the approval/disputes timeout need to be in execution? currently, backing has a 2s timeout and disputes/approval have a 12s timeout. This is in line with Alfonso's recommendation.
how much time collators allocate to block building in order to stay within the 2s timeout on backers.

This issue was mostly made in the context of (2). We previously had limitations on block building in Cumulus by default which were 0.5s of authoring. However, this timeout was not set with the goal of fulfilling (2) and was majority constrained by other factors. With async backing (2) becomes a direct concern. That's all. (1) is orthogonal - except if we want to make sure that 2s of authoring on collators always falls within the backing limit, in which case the timeouts may need to be adjusted upwards. This is probably the wrong order of operations, IMO.

Is this really the only reason we have the 4x gap?

It's the only reason we have 4x gap between the 0.5s allocated to collators block building and the 2s backing timeout, yes. Building, backing, and propagating within a 6s window required a very short block-building timeout mostly due to network and relay chain block propagation overhead.

This is not related to the 6x gap between the 2s allocated to backing and the 12s allocated to approval, as mentioned.

I suspect that honest collators building a block out of an on-disk DB for 2s would likely always stay within the 2s execution limit for backers executing out of memory. Collators building out of an in-memory DB or a RAMdisk would need to be more conservative as a larger proportion of their authoring time is spent in RAM or the CPU.

eskimor · 2023-10-08T10:00:05Z

So, I guess we can close this then. We might end up wanting to increase backing timeout slightly to something like 2.5s at some point, but for now we should be good.

Sophia-Gold added T9-parachains_protocol T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Sep 2, 2023

Sophia-Gold added this to Parachains-core Sep 2, 2023

github-project-automation bot moved this to To do in Parachains-core Sep 2, 2023

github-actions bot added the I10-unconfirmed Issue might be valid, but it's not yet known. label Sep 2, 2023

Sophia-Gold assigned Overkillus Sep 2, 2023

the-right-joyce added this to parachains team board Sep 29, 2023

eskimor closed this as completed Oct 8, 2023

github-project-automation bot moved this from To do to Done in Parachains-core Oct 8, 2023

github-project-automation bot moved this to Done in parachains team board Oct 8, 2023

eskimor removed this from parachains team board Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase backing and approval timeouts for async backing #1366

Increase backing and approval timeouts for async backing #1366

Sophia-Gold commented Sep 2, 2023

burdges commented Sep 2, 2023 •

edited

Loading

rphmeier commented Sep 2, 2023 •

edited

Loading

burdges commented Sep 3, 2023 •

edited

Loading

rphmeier commented Sep 3, 2023 •

edited

Loading

eskimor commented Oct 8, 2023

Increase backing and approval timeouts for async backing #1366

Increase backing and approval timeouts for async backing #1366

Comments

Sophia-Gold commented Sep 2, 2023

burdges commented Sep 2, 2023 • edited Loading

rphmeier commented Sep 2, 2023 • edited Loading

burdges commented Sep 3, 2023 • edited Loading

rphmeier commented Sep 3, 2023 • edited Loading

eskimor commented Oct 8, 2023

burdges commented Sep 2, 2023 •

edited

Loading

rphmeier commented Sep 2, 2023 •

edited

Loading

burdges commented Sep 3, 2023 •

edited

Loading

rphmeier commented Sep 3, 2023 •

edited

Loading