Skip to content

Conversation

paulhauner
Copy link
Member

@paulhauner paulhauner commented Mar 12, 2023

Issue Addressed

NA

Proposed Changes

When producing a block from a builder, there are two points where we could consider the block "broadcast":

  1. When the blinded block is published to the builder.
  2. When the un-blinded block is published to the P2P network (this is always after the previous step).

Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder.

This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards.

Additional Info

One could argue that the builder should return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.

@paulhauner paulhauner added ready-for-review The code is ready for review v4.0.0 Mainnet Capella release expected late March 2023 labels Mar 12, 2023
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice improvement.

Will need a few sprinklings of into_response() to resolve the conflict with my http_api refactor PR.

@michaelsproul michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Mar 13, 2023
@paulhauner paulhauner force-pushed the builder-late-block-logging branch from 3826037 to bcecf95 Compare March 13, 2023 04:29
@paulhauner
Copy link
Member Author

Oops, I messed up an attempted sneaky-rebase and had to do a public rebase!

@paulhauner paulhauner added ready-for-merge This PR is ready to merge. and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Mar 13, 2023
@paulhauner
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Mar 14, 2023
## Issue Addressed

NA

## Proposed Changes

When producing a block from a builder, there are two points where we could consider the block "broadcast":

1. When the blinded block is published to the builder.
2. When the un-blinded block is published to the P2P network (this is always *after* the previous step).

Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder.

This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards.

## Additional Info

One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.
@bors
Copy link

bors bot commented Mar 14, 2023

Build failed (retrying...):

bors bot pushed a commit that referenced this pull request Mar 14, 2023
## Issue Addressed

NA

## Proposed Changes

When producing a block from a builder, there are two points where we could consider the block "broadcast":

1. When the blinded block is published to the builder.
2. When the un-blinded block is published to the P2P network (this is always *after* the previous step).

Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder.

This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards.

## Additional Info

One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.
@bors
Copy link

bors bot commented Mar 14, 2023

Build failed (retrying...):

@michaelsproul
Copy link
Member

I think this might have broken block unblinding (based on the CI failures). Haven't looked in detail yet

bors bot pushed a commit that referenced this pull request Mar 14, 2023
## Issue Addressed

NA

## Proposed Changes

When producing a block from a builder, there are two points where we could consider the block "broadcast":

1. When the blinded block is published to the builder.
2. When the un-blinded block is published to the P2P network (this is always *after* the previous step).

Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder.

This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards.

## Additional Info

One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.
@bors
Copy link

bors bot commented Mar 14, 2023

Build failed:

@michaelsproul michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-merge This PR is ready to merge. labels Mar 14, 2023
.try_into_full_block(Some(full_payload))
.map(Arc::new)
.map(ProvenancedBlock::Builder),
None => None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be what's causing issues in CI, with pre-merge blocks this will return None here causing the Unable to add payload to block error where before calling block.try_into_full_block(None) on a pre-merge block would still return the block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, super shoddy oversight on my behalf. Thanks for point this out and thanks tests!

Fixed in 408af7f.

@paulhauner paulhauner removed the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label Mar 16, 2023
@paulhauner paulhauner added the ready-for-review The code is ready for review label Mar 16, 2023
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookin' good!

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Mar 17, 2023
@michaelsproul
Copy link
Member

bors r+

bors bot pushed a commit that referenced this pull request Mar 17, 2023
## Issue Addressed

NA

## Proposed Changes

When producing a block from a builder, there are two points where we could consider the block "broadcast":

1. When the blinded block is published to the builder.
2. When the un-blinded block is published to the P2P network (this is always *after* the previous step).

Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder.

This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards.

## Additional Info

One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.
@bors bors bot changed the title Reduce false positive logging for late builder blocks [Merged by Bors] - Reduce false positive logging for late builder blocks Mar 17, 2023
@bors bors bot closed this Mar 17, 2023
bors bot pushed a commit that referenced this pull request Jun 7, 2023
## Issue Addressed

NA

## Proposed Changes

Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block.

It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive.

These changes have the same intent as  #4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC.

I've also tidied the log messages to:

- Give them all the same title (*"Error whilst producing block"*) to help with grepping.
- Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped.

## Additional Info

This PR should not change any logic beyond logging.
bors bot pushed a commit that referenced this pull request Jun 7, 2023
## Issue Addressed

NA

## Proposed Changes

Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block.

It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive.

These changes have the same intent as  #4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC.

I've also tidied the log messages to:

- Give them all the same title (*"Error whilst producing block"*) to help with grepping.
- Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped.

## Additional Info

This PR should not change any logic beyond logging.
bors bot pushed a commit that referenced this pull request Jun 7, 2023
## Issue Addressed

NA

## Proposed Changes

Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block.

It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive.

These changes have the same intent as  #4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC.

I've also tidied the log messages to:

- Give them all the same title (*"Error whilst producing block"*) to help with grepping.
- Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped.

## Additional Info

This PR should not change any logic beyond logging.
ghost pushed a commit to oone-world/lighthouse that referenced this pull request Jul 13, 2023
## Issue Addressed

NA

## Proposed Changes

Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block.

It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive.

These changes have the same intent as  sigp#4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC.

I've also tidied the log messages to:

- Give them all the same title (*"Error whilst producing block"*) to help with grepping.
- Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped.

## Additional Info

This PR should not change any logic beyond logging.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Issue Addressed

NA

## Proposed Changes

Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block.

It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive.

These changes have the same intent as  sigp#4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC.

I've also tidied the log messages to:

- Give them all the same title (*"Error whilst producing block"*) to help with grepping.
- Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped.

## Additional Info

This PR should not change any logic beyond logging.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
NA

When producing a block from a builder, there are two points where we could consider the block "broadcast":

1. When the blinded block is published to the builder.
2. When the un-blinded block is published to the P2P network (this is always *after* the previous step).

Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder.

This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards.

One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Issue Addressed

NA

## Proposed Changes

Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block.

It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive.

These changes have the same intent as  sigp#4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC.

I've also tidied the log messages to:

- Give them all the same title (*"Error whilst producing block"*) to help with grepping.
- Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped.

## Additional Info

This PR should not change any logic beyond logging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v4.0.0 Mainnet Capella release expected late March 2023
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants