Skip to content

Conversation

tersec
Copy link
Contributor

@tersec tersec commented Apr 17, 2023

engine_exchangeTransitionConfigurationV1 isn't useful anymore. It was, arguably, maybe, potentially, temporarily useful (but probably not) during the merge. The merge is over. Even by the extended definition used by some, now that Capella is on mainnet, the merge is over.

So now, there's this code that exists for little other reason than to occasionally cause spurious CL/EL communications outages such as seen in https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Flinux%2Fx86_64/detail/unstable/649/pipeline where

{"lvl":"INF","ts":"2023-04-17 18:02:54.152+00:00","msg":"Successfully exchanged engine configuration","topics":"elmon","url":"http://127.0.0.1:6809"}
{"lvl":"WRN","ts":"2023-04-17 18:03:55.356+00:00","msg":"Failed to exchange transition configuration","topics":"elmon","url":"http://127.0.0.1:6809","err":"Timeout"}
{"lvl":"DBG","ts":"2023-04-17 18:04:00.496+00:00","msg":"No EL connection for newPayload"}
{"lvl":"DBG","ts":"2023-04-17 18:04:18.596+00:00","msg":"No EL connection for newPayload"}
{"lvl":"DBG","ts":"2023-04-17 18:04:24.688+00:00","msg":"No EL connection for newPayload"}
{"lvl":"DBG","ts":"2023-04-17 18:04:36.788+00:00","msg":"No EL connection for newPayload"}
{"lvl":"DBG","ts":"2023-04-17 18:04:42.881+00:00","msg":"No EL connection for newPayload"}
{"lvl":"DBG","ts":"2023-04-17 18:04:47.624+00:00","msg":"No EL connection for newPayload"}

That this wasn't more obvious in logs, or in the wild, is probably due to #4830 where the relevant newPayload failing downstream of this timeout in exchangeTransitionConfiguration causing

func hasProperlyConfiguredConnection*(m: ELManager): bool =
for connection in m.elConnections:
if connection.etcStatus == EtcStatus.match:
return true
return false

and
proc newExecutionPayload*(
elManager: ELManager,
executionPayload: ForkyExecutionPayload):
Future[Opt[PayloadExecutionStatus]] {.async.} =
if not elManager.hasProperlyConfiguredConnection:
debug "No EL connection for newPayload"
return Opt.none PayloadExecutionStatus

causing that single timeout to cascade into a minute's worth of lack of communication with the EL. This PR fixes that.

@github-actions
Copy link

Unit Test Results

         9 files  ±0    1 074 suites  ±0   32m 21s ⏱️ - 1m 47s
  3 675 tests ±0    3 396 ✔️ ±0  279 💤 ±0  0 ±0 
15 668 runs  ±0  15 363 ✔️ ±0  305 💤 ±0  0 ±0 

Results for commit bf7fcaf. ± Comparison against base commit 228e10f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants