Skip to content

Comments

Include timely attestations and aggregates#3472

Closed
haxxpop wants to merge 1 commit intoethereum:devfrom
haxxpop:timely-attestations-aggregates
Closed

Include timely attestations and aggregates#3472
haxxpop wants to merge 1 commit intoethereum:devfrom
haxxpop:timely-attestations-aggregates

Conversation

@haxxpop
Copy link
Member

@haxxpop haxxpop commented Aug 4, 2023

From the discussion at #3433 (comment) and in consensus-dev channel in Eth R&D Discord on 06/15/2023.

We found that some clients don't aggregate some timely attestations because they want to have spare time for computation.

For example, Prysm aggregators currently aggregate only the attestations that arrive before 6.5 seconds in the slot rather than 8 seconds as specified in the spec, because it wants to allocate the last 1.5 seconds to do aggregation.

Discrepancy between clients makes network measurement and analysis harder and it's quite weird to let the clients define new constants. I think it's better to encourage clients to behave the same way for this issue.

There are two encouragements in this PR.

  1. We specify in the spec that the aggregator should aggregate all the attestations that arrive before 8s in the slot.
  2. We specify in the spec that the block proposer should add all the aggregates that arrive before the slot time to the block. (This is not part of the discussion, but I think it's good to have)

We don't have to force all the clients to behave the same way currently, because we understand that it requires a lot of work, but it's good to at least encourage them in the spec.

We found that some clients don't aggregate some timely attestations
because they want to have spare time for computation.

For example, Prysm currently aggregate only the attestations that arrvie
before 6.5 seconds in the slot rather than 8 seconds as specified in the
spec, because it wants to allocate the last 1.5 seconds to do
aggregation.

Discrepancy between clients makes network measurement and analysis
harder. It's better to encourage clients to behave the same way for this
issue.

We don't have to force all the clients to behave the same currently,
because we understand that it requires a lot of work, but it's good to
at least encourage them in the spec.
@haxxpop haxxpop closed this Aug 4, 2023
@haxxpop haxxpop reopened this Aug 4, 2023
@potuz
Copy link
Contributor

potuz commented Aug 5, 2023

There's a conundrum here, if a task takes any time > 0 then it's impossible to do both:

  • perform this tasks right up to a timestamp x
  • broadcast the result of this task exactly at x

I think in the case of aggregation, it's more important to broadcast early than to gather attestations longer so I would prefer the spec to be rigorous in requiring the broadcast to happen at an exact time and therefore because of the impossibility above, to just encourage clients to perform the task for as long as they can before the given timestamp

@arnetheduck
Copy link
Contributor

arnetheduck commented Aug 5, 2023

There's a conundrum here, if a task takes any time > 0 then it's impossible to do both:

I think this overstates the problem: aggregating 512 attestations into a single aggregate takes <1ms (assuming the bulk of time is spend performing BLS aggregation) - it is entirely negligible: in fact, aggregating takes less time than validating the aggregate on arrival (validation takes ~1ms), something each client must do before re-broadcasting it.

I'd also point to the following two graphs:

image

These show attestation and aggregate arrival times - as we can see, attestation times are spread out and reach all the way to the 8s mark and beyond meaning that every bit of time up to the 8s cutoff is valuable while aggregates are incredibly tight in the 8-10s bracket after slot start - this points to the fact that shortening the attestation time in order to have more time for aggregates is a bad idea in general. This is further supported by the fact that we produce 20:ish mostly-the-same aggregates per committee and only one of them is actually needed to construct a block, iff it's good.

Also, when aggregates are poor (ie are not full), they are more likely to require separate broadcasting: a fully covering aggregate is more efficient to broadcast than several partially overlapping aggregates - ditto for block construction: it is better for the block construction aggregate selection that aggregates are full rather than partially overlapping because then they take up less space in the block.

Ergo, I believe the PR suggestion represents a better reality towards which to strive, both for network efficient and the subsequent block packing.

Copy link
Member

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving tentatively in favor, as the original spec should be enough to derive such conclusions. However given existing network conditions there's value in explicitly pointing this out. Other clients like Lodestar produce aggregates progressively as they are received, such that an aggregator can get the best available result at any point in time without added latency.

@haxxpop
Copy link
Member Author

haxxpop commented Aug 8, 2023

There's a conundrum here, if a task takes any time > 0 then it's impossible to do both:

That's true. However, it's not a problem if the time is negligible as @arnetheduck said.

In case of aggregation, if you just aggregate everything at 8s, of course, it can be slower. However, you can just start aggregating before 8s and keep aggregating new individual attestations until 8s which will be done very fast comparing to the deadline.

In case of block proposal, it should be fast as well, because you already constructed the EL block before the start of the slot, so, at the start of the slot, you can construct the CL block very fast (including that some fields in the CL block can be constructed in advance).

Also, when aggregates are poor (ie are not full), they are more likely to require separate broadcasting: a fully covering aggregate is more efficient to broadcast than several partially overlapping aggregates - ditto for block construction: it is better for the block construction aggregate selection that aggregates are full rather than partially overlapping because then they take up less space in the block.

That's a good point. I didn't think of it previously.

@potuz
Copy link
Contributor

potuz commented Aug 9, 2023

That's true. However, it's not a problem if the time is negligible as @arnetheduck said.

Time being negligible or not may be an implementation detail. Perhaps Prysm's implementation is just incredibly bad to the point that we have to cut before the time. But I do trust my benchmarks and they showed that on stressed servers we need to cut earlier in order to have a good aggregate ready by 8 seconds. Given that the actual fact is that it's impossible to do a task that requires any non-zero time to perform, I would oppose any wording that does not reflect this. And as stated above, I would mostly favor being strict on the broadcasting time and not on the time that the task has to stop being performed.

To phrase it differently: if the spec requires clients to do something that it's practically impossible, then clients get to decide what they do. I think it's better to specify something that it is actually possible and require clients to fulfill that instead of implicitly accepting that they will have to make a choice.

@haxxpop
Copy link
Member Author

haxxpop commented Aug 10, 2023

I would oppose any wording that does not reflect this. And as stated above, I would mostly favor being strict on the broadcasting time and not on the time that the task has to stop being performed.

I think there is no reason to be strict on the broadcasting time at all. I quite agree with what @arnetheduck said in Discord.

"regarding aggregates in particular, I also think that the spec de facto doesn't require sending on the 8s mark: the time between 8 and 12 seconds is all dedicated to aggregation work - it's a really long and useless window in general"

Please also note that it's a SHOULD, not a MUST, so clients are still free to not follow the added statements.

I think it's better to specify something that it is actually possible and require clients to fulfill that instead of implicitly accepting that they will have to make a choice.

It's not uncommon to specify something that is not required. It's why the Internet standards all use SHOULD and MUST. See https://datatracker.ietf.org/doc/html/rfc2119

Given that the actual fact is that it's impossible to do a task that requires any non-zero time to perform

We can make it possible by just changing the wording to "roughly send the aggregate at 8s" rather than "send the aggregate at 8s". Then it's possible for some non-zero time tasks. However, I think that's not significant.

@potuz
Copy link
Contributor

potuz commented Aug 10, 2023

It's not uncommon to specify something that is not required. It's why the Internet standards all use SHOULD and MUST. See https://datatracker.ietf.org/doc/html/rfc2119

Let me be more precise in my complain: there are two taks: 1) aggregate everything up to 8 seconds. 2) broadcast at 8 seconds.

  • SHOULD, SHOULD does not change much from the current spec
  • MUST, MUST is impossible

So to be stricter than the current spec we should move to (SHOULD, MUST), or (MUST, SHOULD). Either would be fine and I would rather the former because of simplicity and separation of the validator client: if a client is running a Lighthouse VC and request an aggregate at 8 seconds from a Prysm node, with the former Prysm will reply with a full aggregate. With the latter we will reply with an empty aggregate cause we would not have finished.

@haxxpop
Copy link
Member Author

haxxpop commented Aug 10, 2023

So to be stricter than the current spec we should move to (SHOULD, MUST), or (MUST, SHOULD)

I think the third option is (SHOULD, SHOULD), but any of these is fine to me as well.

Either would be fine and I would rather the former

This sounds like you agree with this PR, right? because this PR just specifies that "aggregate everything up to 8 seconds" is a SHOULD.

@leolara
Copy link
Member

leolara commented Jun 4, 2025

I am closing this issue because it seems stale. Please, do not hesitate to reopen it if this is a mistake

@leolara leolara closed this Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants