feat: build AggregatorCommitteeConsensusData and post consensus handling#786
Conversation
bc215d5 to
30a6e87
Compare
0aa42c7 to
9c5c507
Compare
dknopik
left a comment
There was a problem hiding this comment.
Nice! Have only skimmed the metadata_service, will continue tomorrow. But looking good!
| /// Returns an error if the message type exceeds its respective count limit. | ||
| /// | ||
| /// Note: `debug_context` is an optional tuple of (slot, operator_id, num_signatures_in_message) | ||
| /// used for debugging Issue 4 (pre-consensus InvalidPartialSignatureTypeCount). |
There was a problem hiding this comment.
I think this comment is outdated?
There was a problem hiding this comment.
yep, outdated, removed!
| let validator_indices: Vec<_> = messages | ||
| .messages | ||
| .iter() | ||
| .take(5) | ||
| .map(|m| m.validator_index) | ||
| .collect(); | ||
| warn!( | ||
| slot = %messages.slot, | ||
| kind = ?messages.kind, | ||
| num_signatures = messages.messages.len(), | ||
| ?validator_indices, | ||
| "Pre-consensus message rejected (limit exceeded)" | ||
| ); | ||
| return Err(ValidationFailure::InvalidPartialSignatureTypeCount { |
There was a problem hiding this comment.
is this a relic of debugging and to be removed or is there a specific reason why this is added?
There was a problem hiding this comment.
yep, helped debugging during ssv-mini testing. related error has since been resolved in ssv repo
removed it since should be good now
| // ═══════════════════════════════════════════════════════════════════════════════ | ||
| // AggregatorCommittee metrics | ||
| // ═══════════════════════════════════════════════════════════════════════════════ |
| Attestation::Electra(_) => ForkName::Electra.into(), | ||
| }; | ||
| // Fork-gated logic for Boole+ committee-based consensus | ||
| if self.fork_schedule.active_fork(signing_epoch) >= Fork::Boole { |
There was a problem hiding this comment.
This is a looooong if-else.
What do you think about moving the bodies into two function, e.g. produce_signed_aggregate_and_proof_boole and produce_signed_aggregate_and_proof_alan?
There was a problem hiding this comment.
was thinking maybe easier to rip out the alan logic some point post fork with just the if/else. but pretty much just as easy to rip out the helper functions later as well and inline. agreed, refactored!
anchor/validator_store/src/lib.rs
Outdated
| // Defensive check: validator should be in VotingAssignments since both this | ||
| // function call and VotingAssignments are derived from DutiesService. If not, | ||
| // there's an inconsistency (e.g., stale cache after poll timeout) and we | ||
| // should not participate with a wrong `num_signatures_to_collect`. | ||
| // should not participate with a wrong num_signatures_to_collect. |
There was a problem hiding this comment.
Everywhere else in this file, you add the backticks, here you remove them. I am in favor of adding/keeping them :)
There was a problem hiding this comment.
ha yah, let's keep them. they've been resurrected
anchor/validator_store/src/lib.rs
Outdated
| // Defensive check: validator should be in VotingAssignments since both this | ||
| // function call and VotingAssignments are derived from DutiesService. If not, | ||
| // there's an inconsistency (e.g., stale cache after poll timeout) and we | ||
| // should not participate with a wrong `num_signatures_to_collect`. | ||
| // should not participate with a wrong num_signatures_to_collect. |
| contribution, | ||
| selection_proof, | ||
| }; | ||
| if self.fork_schedule.active_fork(epoch) >= Fork::Boole { |
There was a problem hiding this comment.
Again, wdyt about moving the bodies into independent functions?
f54bcf1 to
c91352f
Compare
dknopik
left a comment
There was a problem hiding this comment.
Another small note, otherwise LGTM.
| }; | ||
|
|
||
| /// Tuple of (`validator_index`, `pubkey`, `selection_proof`) for sync committee aggregators. | ||
| type SyncAggregatorData = (u64, PublicKeyBytes, SyncSelectionProof); |
There was a problem hiding this comment.
At this point, a struct might be easier to understand. Matter of taste though.
|
Nice, LGTM! Agree with @dknopik latest comment |
81a67cc to
d1d94f8
Compare
|
@claude can you do another check of this pr |
|
Claude finished @shane-moore's task in 2m 25s —— View job PR Review Complete ✓I've conducted a comprehensive code review of the Boole fork aggregation implementation. The code is generally high-quality with good Rust practices, but I've identified several issues requiring attention. Critical Issues (Must Fix Before Merge)1. Missing Error Context - Production Debugging RiskLocation: let validator_index = validator.index.ok_or(SpecificError::MissingIndex)?;Issue: Recommendation: Enhance the error type to include context: let validator_index = validator.index.ok_or_else(|| {
SpecificError::MissingValidatorIndex {
validator_pubkey,
slot,
}
})?;2. Potential Index Mismatch - Data SafetyLocation: let decided_aggregate_bytes = decided_data
.aggregated_attestations
.get(decided_aggregate_idx)Issue: Logic assumes Recommendation: Add defensive assertion: debug_assert_eq!(
decided_data.aggregator_committee_indexes.len(),
decided_data.aggregated_attestations.len(),
"Decided data validation failed - index/attestation count mismatch"
);3. Hard-coded Timeout Without ValidationLocation: const BEACON_API_FETCH_TIMEOUT: Duration = Duration::from_secs(2);Issue: 2-second timeout may be too aggressive for some networks. No validation that it leaves sufficient time for QBFT consensus (which needs to complete before end of 2/3 slot window). Recommendation: Make configurable via CLI with validation, or add compile-time check ensuring timeout < remaining slot time. Warnings (Should Fix)4. Non-Deterministic Iteration Order - Consensus RiskLocation: let ssv_committees: HashSet<CommitteeId> = attesters_by_ssv_committee
.keys()
.chain(sync_by_ssv_committee.keys())
.copied()
.collect();Issue: Recommendation: Use 5. Potential Slot Race in Cache AccessLocation: let voting_assignments = self
.validator_store
.get_voting_assignments(slot)
.awaitIssue: Phase 3 (2/3 slot) reads Recommendation: Add slot validation after cache read: if voting_assignments.slot != slot {
return Err(format!("Slot mismatch: expected {}, got {}", slot, voting_assignments.slot));
}6. Missing Committee Index Bounds CheckLocation: index: if fork_name < ForkName::Electra { committee_index } else { 0 },Issue: Recommendation: Add validation before use. 7. Expensive Clone on Critical PathLocation: (*our_consensus_data).clone(),Issue: Recommendation: Profile clone cost and consider storing Suggestions (Nice to Have)8. Inefficient Metrics LoggingLocation: for _ in 0..successful {
metrics::inc_counter_vec(...)
}Issue: Incrementing counter in a loop is inefficient. Recommendation: Use bulk increment: metrics::inc_counter_vec_by(&metrics::AGGREGATOR_COMMITTEE_FETCH_SUCCESS, &["aggregated_attestations", "success"], successful as u64);9. Magic Numbers Without ConstantsLocation: Duration::from_secs(self.spec.seconds_per_slot) * 2 / 3,Issue: Fraction Recommendation: Define named constant explaining its significance. 10. Missing Early Return OptimizationLocation: if aggregators.is_empty() && contributors.is_empty() {
return Ok(None);
}Issue: Check happens after expensive beacon API fetches. Should check before fetching. 11. Error Messages Lack ContextLocation: Error messages in beacon API calls lack slot, committee_index, and endpoint context for debugging. Strengths ✨
SummaryOverall Assessment: The code is high-quality with strong Rust practices. Critical issues primarily concern error context and defensive programming rather than fundamental safety bugs. Priority Actions:
Impact: Addressing critical issues will significantly improve production debugging and operational safety. The implementation follows Anchor's CLAUDE.md guidelines and maintains high code quality standards. |
|
@claude, I have reviewed your thoughts 1-11 and have found that none are actionable at this time. |
|
Claude finished @shane-moore's task in 19s —— View job PR Review AcknowledgmentThank you for reviewing my feedback. I understand that none of the 11 items from my previous review are actionable at this time. SummaryItems Reviewed: 11 total
Outcome: No changes will be implemented based on this review feedback. NotesThe items I identified remain documented in the previous comment for future reference if circumstances change. The PR has been approved by other reviewers (dknopik, petarjuki7) and appears ready to merge from the team's perspective. |
…om the poll channels
f9d7148 to
e47fd0b
Compare
|
@shane-moore Claude's point 8 seems valid and easy |
@dknopik, agreed! killed a couple loops actually which is nice 81062ba |
| executor.clone(), | ||
| ); | ||
|
|
||
| let metadata_service = MetadataService::new( |
There was a problem hiding this comment.
I believe we could use a more descriptive name for MetadataService
There was a problem hiding this comment.
MetadataService has grown into a “god service” (assignment caching + beacon fetches + consensus data construction + committee grouping). That’s too many responsibilities for one module. A cleaner split would keep this service focused on publishing duty inputs (VotingAssignments, VotingContext, AggregationAssignments) and move consensus-data fetching/building into a dedicated AggregatorConsensusDataBuilder (or similar) module.
Given that scope, MetadataService is misleading. After the split, I’d rename it to DutyInputPublisher to reflect its actual role.
| /// being split across multiple messages. | ||
| /// | ||
| /// Returns: count(aggregators) + count(contributors) filtered by committee membership | ||
| pub fn post_consensus_signature_count<F>(&self, is_in_committee: F) -> usize |
There was a problem hiding this comment.
Is that the right place for this function?
There was a problem hiding this comment.
It might be better as validator‑store logic (or a helper) rather than in the shared ssv_types crate
|
|
||
| /// Boole+ path for `produce_signed_aggregate_and_proof`: committee-based consensus using | ||
| /// `AggregatorCommitteeConsensusData`. | ||
| #[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
Ideally, we should avoid using this
There was a problem hiding this comment.
dope suggestions! assigned myself a ticket to look into all this so that your suggestions don't fall through the cracks #807
| let completed = self | ||
| .qbft_manager | ||
| .decide_instance( | ||
| AggregatorCommitteeInstanceId { |
There was a problem hiding this comment.
Question: Is the shared QBFT instance between attestation aggregation and sync contribution intentional?
Both produce_signed_aggregate_and_proof_boole (line 734) and produce_signed_contribution_and_proof_boole (here) call decide_instance with the same AggregatorCommitteeInstanceId { committee, instance_height: slot }.
Since QbftManager::get_or_spawn_instance deduplicates by instance ID, only the first caller's Config/validator/start_data actually create the QBFT instance — the second caller silently joins the existing instance via the QbftInstance::Initialized branch and receives the same decided result.
This means:
- Whichever of the two methods runs first (attestation agg or sync contribution) creates the instance; the other joins.
- The second caller's
timeout_mode,AggregatorCommitteeDataValidator, andour_consensus_dataare silently discarded (only anon_completedcallback is registered). - If these two paths ever need different configurations (e.g., different timeout behavior), it would break subtly.
Could you confirm this is the intended design? And if so, would it be worth adding a comment at both call sites (and updating decide_instance's doc which says "brand new qbft instance") to make this shared-instance pattern explicit?
There was a problem hiding this comment.
hey, yep, all that's correct. works pretty much like how attestations and sync votes duties are rolled up into a single CommitteeInstanceId. In this case, it's aggregated attestations and sync committee contributions that will share a single AggregatorCommitteeInstanceId. and agree that adding the documentation is a great idea. will knock that out
Issue Addressed
Continues committee-based aggregation support for the Boole fork. This PR support #770
Proposed Changes
AggregatorCommitteeConsensusDatain the validator store metadata service for QBFT consensusproduce_aggregate_and_proofto handle post boole committee based aggregation flow with post-consensus signingproduce_sync_aggregate_and_proofto handle post boole committee based sync aggregation flow with post-consensus signingAdditional Info
The consensus data follows SSV protocol requirements: