[Merged by Bors] - Reduce false positive logging for late builder blocks #4073

paulhauner · 2023-03-12T23:14:59Z

Issue Addressed

NA

Proposed Changes

When producing a block from a builder, there are two points where we could consider the block "broadcast":

When the blinded block is published to the builder.
When the un-blinded block is published to the P2P network (this is always after the previous step).

Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder.

This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards.

Additional Info

One could argue that the builder should return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.

michaelsproul

Looks good, nice improvement.

Will need a few sprinklings of into_response() to resolve the conflict with my http_api refactor PR.

beacon_node/http_api/src/publish_blocks.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

paulhauner · 2023-03-13T04:29:53Z

Oops, I messed up an attempted sneaky-rebase and had to do a public rebase!

paulhauner · 2023-03-14T01:39:44Z

bors r+

## Issue Addressed NA ## Proposed Changes When producing a block from a builder, there are two points where we could consider the block "broadcast": 1. When the blinded block is published to the builder. 2. When the un-blinded block is published to the P2P network (this is always *after* the previous step). Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder. This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards. ## Additional Info One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.

bors · 2023-03-14T04:59:49Z

Build failed (retrying...):

release-tests-ubuntu

## Issue Addressed NA ## Proposed Changes When producing a block from a builder, there are two points where we could consider the block "broadcast": 1. When the blinded block is published to the builder. 2. When the un-blinded block is published to the P2P network (this is always *after* the previous step). Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder. This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards. ## Additional Info One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.

bors · 2023-03-14T06:26:25Z

Build failed (retrying...):

release-tests-ubuntu

michaelsproul · 2023-03-14T08:10:34Z

I think this might have broken block unblinding (based on the CI failures). Haven't looked in detail yet

## Issue Addressed NA ## Proposed Changes When producing a block from a builder, there are two points where we could consider the block "broadcast": 1. When the blinded block is published to the builder. 2. When the un-blinded block is published to the P2P network (this is always *after* the previous step). Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder. This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards. ## Additional Info One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.

bors · 2023-03-14T09:51:44Z

Build failed:

release-tests-ubuntu

realbigsean · 2023-03-14T13:51:58Z

beacon_node/http_api/src/publish_blocks.rs

+            .try_into_full_block(Some(full_payload))
+            .map(Arc::new)
+            .map(ProvenancedBlock::Builder),
+        None => None,


this might be what's causing issues in CI, with pre-merge blocks this will return None here causing the Unable to add payload to block error where before calling block.try_into_full_block(None) on a pre-merge block would still return the block

Oooh, super shoddy oversight on my behalf. Thanks for point this out and thanks tests!

Fixed in 408af7f.

michaelsproul

Lookin' good!

michaelsproul · 2023-03-17T00:43:44Z

bors r+

## Issue Addressed NA ## Proposed Changes When producing a block from a builder, there are two points where we could consider the block "broadcast": 1. When the blinded block is published to the builder. 2. When the un-blinded block is published to the P2P network (this is always *after* the previous step). Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder. This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards. ## Additional Info One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.

bors · 2023-03-17T03:03:01Z

Pull request successfully merged into unstable.

Build succeeded:

## Issue Addressed NA ## Proposed Changes Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block. It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive. These changes have the same intent as #4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC. I've also tidied the log messages to: - Give them all the same title (*"Error whilst producing block"*) to help with grepping. - Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped. ## Additional Info This PR should not change any logic beyond logging.

## Issue Addressed NA ## Proposed Changes Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block. It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive. These changes have the same intent as sigp#4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC. I've also tidied the log messages to: - Give them all the same title (*"Error whilst producing block"*) to help with grepping. - Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped. ## Additional Info This PR should not change any logic beyond logging.

NA When producing a block from a builder, there are two points where we could consider the block "broadcast": 1. When the blinded block is published to the builder. 2. When the un-blinded block is published to the P2P network (this is always *after* the previous step). Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder. This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards. One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.

## Issue Addressed NA ## Proposed Changes Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block. It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive. These changes have the same intent as sigp#4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC. I've also tidied the log messages to: - Give them all the same title (*"Error whilst producing block"*) to help with grepping. - Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped. ## Additional Info This PR should not change any logic beyond logging.

paulhauner added 5 commits March 13, 2023 09:36

Trace provenance of blocks

5b2a54e

Unify late block logging

666e539

Add comment

f8472eb

Split block delay metrics

2fe05e8

Tidy

7ae7412

paulhauner added ready-for-review The code is ready for review v4.0.0 Mainnet Capella release expected late March 2023 labels Mar 12, 2023

michaelsproul approved these changes Mar 13, 2023

View reviewed changes

beacon_node/http_api/src/publish_blocks.rs Outdated Show resolved Hide resolved

michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Mar 13, 2023

paulhauner and others added 2 commits March 13, 2023 15:10

Update beacon_node/http_api/src/publish_blocks.rs

00bcb61

Co-authored-by: Michael Sproul <micsproul@gmail.com>

Merge branch 'unstable' into builder-late-block-logging

bcecf95

paulhauner force-pushed the builder-late-block-logging branch from 3826037 to bcecf95 Compare March 13, 2023 04:29

paulhauner added ready-for-merge This PR is ready to merge. and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Mar 13, 2023

michaelsproul approved these changes Mar 13, 2023

View reviewed changes

michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-merge This PR is ready to merge. labels Mar 14, 2023

realbigsean reviewed Mar 14, 2023

View reviewed changes

Fix pre-merge block production

408af7f

paulhauner removed the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label Mar 16, 2023

paulhauner added the ready-for-review The code is ready for review label Mar 16, 2023

michaelsproul approved these changes Mar 17, 2023

View reviewed changes

michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Mar 17, 2023

bors bot changed the title ~~Reduce false positive logging for late builder blocks~~ [Merged by Bors] - Reduce false positive logging for late builder blocks Mar 17, 2023

bors bot closed this Mar 17, 2023

paulhauner mentioned this pull request Jun 2, 2023

[Merged by Bors] - Downgrade a CRIT in the VC for builder timeouts #4366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Merged by Bors] - Reduce false positive logging for late builder blocks #4073

[Merged by Bors] - Reduce false positive logging for late builder blocks #4073

Uh oh!

paulhauner commented Mar 12, 2023 •

edited

Loading

Uh oh!

michaelsproul left a comment

Uh oh!

Uh oh!

paulhauner commented Mar 13, 2023

Uh oh!

paulhauner commented Mar 14, 2023

Uh oh!

bors bot commented Mar 14, 2023

Uh oh!

bors bot commented Mar 14, 2023

Uh oh!

michaelsproul commented Mar 14, 2023

Uh oh!

bors bot commented Mar 14, 2023

Uh oh!

realbigsean Mar 14, 2023

Uh oh!

paulhauner Mar 16, 2023

Uh oh!

michaelsproul left a comment

Uh oh!

michaelsproul commented Mar 17, 2023

Uh oh!

bors bot commented Mar 17, 2023

Uh oh!

Uh oh!

[Merged by Bors] - Reduce false positive logging for late builder blocks #4073

[Merged by Bors] - Reduce false positive logging for late builder blocks #4073

Uh oh!

Conversation

paulhauner commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Proposed Changes

Additional Info

Uh oh!

michaelsproul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paulhauner commented Mar 13, 2023

Uh oh!

paulhauner commented Mar 14, 2023

Uh oh!

bors bot commented Mar 14, 2023

Uh oh!

bors bot commented Mar 14, 2023

Uh oh!

michaelsproul commented Mar 14, 2023

Uh oh!

bors bot commented Mar 14, 2023

Uh oh!

realbigsean Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

paulhauner Mar 16, 2023

Choose a reason for hiding this comment

Uh oh!

michaelsproul left a comment

Choose a reason for hiding this comment

Uh oh!

michaelsproul commented Mar 17, 2023

Uh oh!

bors bot commented Mar 17, 2023

Uh oh!

Uh oh!

paulhauner commented Mar 12, 2023 •

edited

Loading