Skip to content

Comments

feat: build AggregatorCommitteeConsensusData and post consensus handling#786

Merged
mergify[bot] merged 23 commits intosigp:unstablefrom
shane-moore:create-aggregator-committee-consensus-data
Feb 4, 2026
Merged

feat: build AggregatorCommitteeConsensusData and post consensus handling#786
mergify[bot] merged 23 commits intosigp:unstablefrom
shane-moore:create-aggregator-committee-consensus-data

Conversation

@shane-moore
Copy link
Member

@shane-moore shane-moore commented Jan 26, 2026

Issue Addressed

Continues committee-based aggregation support for the Boole fork. This PR support #770

Proposed Changes

  • Build AggregatorCommitteeConsensusData in the validator store metadata service for QBFT consensus
  • Update produce_aggregate_and_proof to handle post boole committee based aggregation flow with post-consensus signing
  • Update produce_sync_aggregate_and_proof to handle post boole committee based sync aggregation flow with post-consensus signing
  • Fetch aggregated attestations and sync contributions from beacon nodes with parallel requests and timeouts
  • Implement sorting and filtering for consensus compatibility with SSV Go
  • Add metrics and logging for consensus data construction and pre-consensus validation

Additional Info

The consensus data follows SSV protocol requirements:

  • Aggregators sorted by validator index
  • Contributors sorted by signing root then validator index
  • 1:1 correspondence between aggregators and fetched attestations
  • Handles partial fetch failures while maintaining consensus data integrity

@shane-moore shane-moore force-pushed the create-aggregator-committee-consensus-data branch from bc215d5 to 30a6e87 Compare January 26, 2026 17:05
@dknopik dknopik added the v2.0.0 The release shipping the next network upgrade label Feb 2, 2026
@shane-moore shane-moore force-pushed the create-aggregator-committee-consensus-data branch 5 times, most recently from 0aa42c7 to 9c5c507 Compare February 2, 2026 16:04
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Have only skimmed the metadata_service, will continue tomorrow. But looking good!

/// Returns an error if the message type exceeds its respective count limit.
///
/// Note: `debug_context` is an optional tuple of (slot, operator_id, num_signatures_in_message)
/// used for debugging Issue 4 (pre-consensus InvalidPartialSignatureTypeCount).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is outdated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, outdated, removed!

Comment on lines 75 to 88
let validator_indices: Vec<_> = messages
.messages
.iter()
.take(5)
.map(|m| m.validator_index)
.collect();
warn!(
slot = %messages.slot,
kind = ?messages.kind,
num_signatures = messages.messages.len(),
?validator_indices,
"Pre-consensus message rejected (limit exceeded)"
);
return Err(ValidationFailure::InvalidPartialSignatureTypeCount {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a relic of debugging and to be removed or is there a specific reason why this is added?

Copy link
Member Author

@shane-moore shane-moore Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, helped debugging during ssv-mini testing. related error has since been resolved in ssv repo

removed it since should be good now

Comment on lines +58 to +60
// ═══════════════════════════════════════════════════════════════════════════════
// AggregatorCommittee metrics
// ═══════════════════════════════════════════════════════════════════════════════
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding metrics! <3

Attestation::Electra(_) => ForkName::Electra.into(),
};
// Fork-gated logic for Boole+ committee-based consensus
if self.fork_schedule.active_fork(signing_epoch) >= Fork::Boole {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a looooong if-else.

What do you think about moving the bodies into two function, e.g. produce_signed_aggregate_and_proof_boole and produce_signed_aggregate_and_proof_alan?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was thinking maybe easier to rip out the alan logic some point post fork with just the if/else. but pretty much just as easy to rip out the helper functions later as well and inline. agreed, refactored!

Comment on lines 1748 to 1751
// Defensive check: validator should be in VotingAssignments since both this
// function call and VotingAssignments are derived from DutiesService. If not,
// there's an inconsistency (e.g., stale cache after poll timeout) and we
// should not participate with a wrong `num_signatures_to_collect`.
// should not participate with a wrong num_signatures_to_collect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere else in this file, you add the backticks, here you remove them. I am in favor of adding/keeping them :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha yah, let's keep them. they've been resurrected

Comment on lines 1861 to 1864
// Defensive check: validator should be in VotingAssignments since both this
// function call and VotingAssignments are derived from DutiesService. If not,
// there's an inconsistency (e.g., stale cache after poll timeout) and we
// should not participate with a wrong `num_signatures_to_collect`.
// should not participate with a wrong num_signatures_to_collect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Member Author

@shane-moore shane-moore Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, resurrected

contribution,
selection_proof,
};
if self.fork_schedule.active_fork(epoch) >= Fork::Boole {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, wdyt about moving the bodies into independent functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, done!

@shane-moore shane-moore force-pushed the create-aggregator-committee-consensus-data branch from f54bcf1 to c91352f Compare February 2, 2026 23:24
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small note, otherwise LGTM.

};

/// Tuple of (`validator_index`, `pubkey`, `selection_proof`) for sync committee aggregators.
type SyncAggregatorData = (u64, PublicKeyBytes, SyncSelectionProof);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, a struct might be easier to understand. Matter of taste though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure sure! updated in 979dec4

@petarjuki7
Copy link
Member

Nice, LGTM! Agree with @dknopik latest comment

@shane-moore shane-moore force-pushed the create-aggregator-committee-consensus-data branch from 81a67cc to d1d94f8 Compare February 3, 2026 16:32
@shane-moore
Copy link
Member Author

@claude can you do another check of this pr

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Feb 3, 2026

Claude finished @shane-moore's task in 2m 25s —— View job


PR Review Complete ✓

I've conducted a comprehensive code review of the Boole fork aggregation implementation. The code is generally high-quality with good Rust practices, but I've identified several issues requiring attention.


Critical Issues (Must Fix Before Merge)

1. Missing Error Context - Production Debugging Risk

Location: anchor/validator_store/src/lib.rs:758, 1054

let validator_index = validator.index.ok_or(SpecificError::MissingIndex)?;

Issue: MissingIndex error lacks context (which validator, which slot). This makes production debugging very difficult.

Recommendation: Enhance the error type to include context:

let validator_index = validator.index.ok_or_else(|| {
    SpecificError::MissingValidatorIndex {
        validator_pubkey,
        slot,
    }
})?;

2. Potential Index Mismatch - Data Safety

Location: anchor/validator_store/src/lib.rs:800-805

let decided_aggregate_bytes = decided_data
    .aggregated_attestations
    .get(decided_aggregate_idx)

Issue: Logic assumes aggregator_committee_indexes.len() == aggregated_attestations.len(). While validation exists in consensus.rs:627-633, this function trusts that validation happened. If bypassed, could sign wrong attestations.

Recommendation: Add defensive assertion:

debug_assert_eq!(
    decided_data.aggregator_committee_indexes.len(),
    decided_data.aggregated_attestations.len(),
    "Decided data validation failed - index/attestation count mismatch"
);

3. Hard-coded Timeout Without Validation

Location: anchor/validator_store/src/metadata_service.rs:46

const BEACON_API_FETCH_TIMEOUT: Duration = Duration::from_secs(2);

Issue: 2-second timeout may be too aggressive for some networks. No validation that it leaves sufficient time for QBFT consensus (which needs to complete before end of 2/3 slot window).

Recommendation: Make configurable via CLI with validation, or add compile-time check ensuring timeout < remaining slot time.


Warnings (Should Fix)

4. Non-Deterministic Iteration Order - Consensus Risk

Location: anchor/validator_store/src/metadata_service.rs:481-485

let ssv_committees: HashSet<CommitteeId> = attesters_by_ssv_committee
    .keys()
    .chain(sync_by_ssv_committee.keys())
    .copied()
    .collect();

Issue: HashSet iteration order is non-deterministic. While consensus data within each committee is deterministic, processing order across committees could affect timing.

Recommendation: Use BTreeSet for deterministic ordering.

5. Potential Slot Race in Cache Access

Location: anchor/validator_store/src/metadata_service.rs:265-270

let voting_assignments = self
    .validator_store
    .get_voting_assignments(slot)
    .await

Issue: Phase 3 (2/3 slot) reads VotingAssignments while Phase 1 (next slot) may be updating. No explicit validation ensures Phase 3 reads data for the correct slot.

Recommendation: Add slot validation after cache read:

if voting_assignments.slot != slot {
    return Err(format!("Slot mismatch: expected {}, got {}", slot, voting_assignments.slot));
}

6. Missing Committee Index Bounds Check

Location: anchor/validator_store/src/metadata_service.rs:712

index: if fork_name < ForkName::Electra { committee_index } else { 0 },

Issue: committee_index from duties service is used without validating it's within beacon chain limits (0-63 for Mainnet).

Recommendation: Add validation before use.

7. Expensive Clone on Critical Path

Location: anchor/validator_store/src/lib.rs:739, 1029

(*our_consensus_data).clone(),

Issue: AggregatorCommitteeConsensusData contains heap-allocated VariableList and large SSZ-encoded attestations. Cloning happens at 2/3 slot timing (critical path).

Recommendation: Profile clone cost and consider storing Arc<AggregatorCommitteeConsensusData> to avoid clones.


Suggestions (Nice to Have)

8. Inefficient Metrics Logging

Location: anchor/validator_store/src/metadata_service.rs:799-814

for _ in 0..successful {
    metrics::inc_counter_vec(...)
}

Issue: Incrementing counter in a loop is inefficient.

Recommendation: Use bulk increment:

metrics::inc_counter_vec_by(&metrics::AGGREGATOR_COMMITTEE_FETCH_SUCCESS, &["aggregated_attestations", "success"], successful as u64);

9. Magic Numbers Without Constants

Location: anchor/validator_store/src/lib.rs:728, 1018

Duration::from_secs(self.spec.seconds_per_slot) * 2 / 3,

Issue: Fraction 2/3 repeated multiple times without explanation.

Recommendation: Define named constant explaining its significance.

10. Missing Early Return Optimization

Location: anchor/validator_store/src/metadata_service.rs:608

if aggregators.is_empty() && contributors.is_empty() {
    return Ok(None);
}

Issue: Check happens after expensive beacon API fetches. Should check before fetching.

11. Error Messages Lack Context

Location: anchor/validator_store/src/metadata_service.rs:734

Error messages in beacon API calls lack slot, committee_index, and endpoint context for debugging.


Strengths ✨

  1. Excellent test coverage - Comprehensive unit tests for sorting, filtering, validation (consensus.rs:1168-1707, metadata_service.rs:991+)
  2. Good use of Rust idioms - Pure helper functions, custom error types with thiserror, appropriate Arc usage
  3. Well-documented sorting requirements - Clear comments explaining consensus compatibility with SSV Go
  4. Proper async/await patterns - Good use of FuturesUnordered with timeout handling, no blocking calls
  5. Defense-in-depth validation - Comprehensive validation logic mirrors SSV Go spec exactly

Summary

Overall Assessment: The code is high-quality with strong Rust practices. Critical issues primarily concern error context and defensive programming rather than fundamental safety bugs.

Priority Actions:

  1. ✅ Add context to error types (Issue Draft a Testing Framework #1)
  2. ✅ Add defensive assertions for data integrity (Issue Add basic repository structure #2)
  3. ✅ Validate/configure beacon API timeout (Issue Remove sigp-crates dependency #3)
  4. ✅ Add slot validation in cache reads (Issue Rename the project to "Anchor" #5)
  5. 🔍 Profile clone performance (Issue Auto compile and publish mdbook #7)

Impact: Addressing critical issues will significantly improve production debugging and operational safety. The implementation follows Anchor's CLAUDE.md guidelines and maintains high code quality standards.


@shane-moore
Copy link
Member Author

@claude, I have reviewed your thoughts 1-11 and have found that none are actionable at this time.

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Feb 3, 2026

Claude finished @shane-moore's task in 19s —— View job


PR Review Acknowledgment

Thank you for reviewing my feedback. I understand that none of the 11 items from my previous review are actionable at this time.

Summary

Items Reviewed: 11 total

  • 3 Critical Issues
  • 4 Warnings
  • 4 Suggestions

Outcome: No changes will be implemented based on this review feedback.

Notes

The items I identified remain documented in the previous comment for future reference if circumstances change. The PR has been approved by other reviewers (dknopik, petarjuki7) and appears ready to merge from the team's perspective.


@shane-moore shane-moore force-pushed the create-aggregator-committee-consensus-data branch from f9d7148 to e47fd0b Compare February 3, 2026 23:58
@dknopik
Copy link
Member

dknopik commented Feb 4, 2026

@shane-moore Claude's point 8 seems valid and easy

@shane-moore
Copy link
Member Author

@shane-moore Claude's point 8 seems valid and easy

@dknopik, agreed! killed a couple loops actually which is nice 81062ba

Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@mergify mergify bot merged commit 589e6bc into sigp:unstable Feb 4, 2026
19 of 20 checks passed
executor.clone(),
);

let metadata_service = MetadataService::new(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we could use a more descriptive name for MetadataService

Copy link
Member

@diegomrsantos diegomrsantos Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetadataService has grown into a “god service” (assignment caching + beacon fetches + consensus data construction + committee grouping). That’s too many responsibilities for one module. A cleaner split would keep this service focused on publishing duty inputs (VotingAssignments, VotingContext, AggregationAssignments) and move consensus-data fetching/building into a dedicated AggregatorConsensusDataBuilder (or similar) module.

Given that scope, MetadataService is misleading. After the split, I’d rename it to DutyInputPublisher to reflect its actual role.

/// being split across multiple messages.
///
/// Returns: count(aggregators) + count(contributors) filtered by committee membership
pub fn post_consensus_signature_count<F>(&self, is_in_committee: F) -> usize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the right place for this function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better as validator‑store logic (or a helper) rather than in the shared ssv_types crate


/// Boole+ path for `produce_signed_aggregate_and_proof`: committee-based consensus using
/// `AggregatorCommitteeConsensusData`.
#[allow(clippy::too_many_arguments)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should avoid using this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dope suggestions! assigned myself a ticket to look into all this so that your suggestions don't fall through the cracks #807

let completed = self
.qbft_manager
.decide_instance(
AggregatorCommitteeInstanceId {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is the shared QBFT instance between attestation aggregation and sync contribution intentional?

Both produce_signed_aggregate_and_proof_boole (line 734) and produce_signed_contribution_and_proof_boole (here) call decide_instance with the same AggregatorCommitteeInstanceId { committee, instance_height: slot }.

Since QbftManager::get_or_spawn_instance deduplicates by instance ID, only the first caller's Config/validator/start_data actually create the QBFT instance — the second caller silently joins the existing instance via the QbftInstance::Initialized branch and receives the same decided result.

This means:

  1. Whichever of the two methods runs first (attestation agg or sync contribution) creates the instance; the other joins.
  2. The second caller's timeout_mode, AggregatorCommitteeDataValidator, and our_consensus_data are silently discarded (only an on_completed callback is registered).
  3. If these two paths ever need different configurations (e.g., different timeout behavior), it would break subtly.

Could you confirm this is the intended design? And if so, would it be worth adding a comment at both call sites (and updating decide_instance's doc which says "brand new qbft instance") to make this shared-instance pattern explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, yep, all that's correct. works pretty much like how attestations and sync votes duties are rolled up into a single CommitteeInstanceId. In this case, it's aggregated attestations and sync committee contributions that will share a single AggregatorCommitteeInstanceId. and agree that adding the documentation is a great idea. will knock that out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge v2.0.0 The release shipping the next network upgrade

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants