fix(consensus): three minimal audit fixes (#67, #41, #43)#732
Open
keanji-x wants to merge 2 commits into
Open
Conversation
…d of panicking Closes #67 (gravity-audit). When BLS signature aggregation fails (e.g. after a validator-set rotation that invalidates some collected per-signer keys), the old code hit `unreachable!()` and crashed the `ProofCoordinator` tokio task, stalling consensus on the affected node. This is a minimal surgical fix that mirrors the relevant slice of upstream aptos-core PR #14644: - Change `IncrementalProofState::take` to return `Result<ProofOfStore, SignedBatchInfoError>`. - On aggregation failure, log via `error!` and return `SignedBatchInfoError::UnableToAggregate` (new variant) instead of panicking. Leave `completed = false` so a future attempt can retry. - Propagate `?` at the single caller in `aggregate_and_verify`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ump guard in process_order_vote_msg Closes #41 and #43 (gravity-audit). #41: PendingOrderVotes had no per-author tracking, so a Byzantine validator could submit order votes for multiple conflicting LedgerInfo digests in the same round and have its voting power counted in each bucket. Mirror the dedup pattern from PendingVotes: - Add `author_to_li_digest: HashMap<Author, HashValue>` to PendingOrderVotes. - In `insert_order_vote`, detect equivocation (same author, different digest) and return EquivocateVote with a SecurityEvent::ConsensusEquivocatingVote log. - Garbage-collect author entries alongside their digest entries. #43: process_order_vote_msg was the only message handler that didn't call ensure_round_and_sync_up, so a Byzantine validator could ship an OrderVoteMsg with a valid QC many rounds ahead of our local state, fast-forward our round via the embedded QC's side-effect, then have the attached order vote accepted for a round we never sequentially processed. Since OrderVoteMsg carries no SyncInfo, the minimal equivalent guard is to snapshot pre-state highest_ordered_round before the QC insertion and reject the order vote if it would jump that window by more than MAX_ORDER_VOTE_ROUND_JUMP. This is a slice of upstream aptos-core PR #14637 ("Sync up QC in order vote message"); we deliberately do NOT take the verified_quorum_cert type refactor, which conflicts heavily with our tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Combined PR — three minimal surgical fixes for gravity-audit findings in the vendored aptos-core consensus crate. Supersedes #726 and #727.
Commit 1 — gravity-audit#67 (Critical)
IncrementalProofState::take()panicked viaunreachable!()when BLS signature aggregation failed (legitimately reachable after a validator-set rotation), crashing theProofCoordinatortask and halting consensus. Changed return type toResult<ProofOfStore, SignedBatchInfoError>, addedSignedBatchInfoError::UnableToAggregate, propagated?at the one caller inadd_signature.Commit 2 — gravity-audit#41 (Critical) + #43 (High)
PendingOrderVoteshad no per-author tracking, so a Byzantine validator could vote for conflictingLedgerInfodigests and have its voting power counted in each bucket. Addedauthor_to_li_digest: HashMap<Author, HashValue>mirroringPendingVotes::insert_vote, equivocation detection withSecurityEvent::ConsensusEquivocatingVotelog, GC of the new map alongside existing entries.process_order_vote_msgwas the only message handler not callingensure_round_and_sync_up. SinceOrderVoteMsgcarries noSyncInfo, added a snapshot ofhighest_ordered_roundbeforenew_qc_from_order_vote_msgruns, and aMAX_ORDER_VOTE_ROUND_JUMP = 3guard rejecting order votes that jump past that window via the embedded QC.Why minimal (not full upstream backports)
li_digest_to_votesinto a(QuorumCert, OrderVoteStatus)map and plumbsverified_quorum_certthrough the call chain (~164 lines of conflicts).These PRs port only the security-bearing slices.
Notable deviations the reviewer should check
MAX_ORDER_VOTE_ROUND_JUMP = 3— chosen to avoid breaking pipelined ordering; reviewer should confirm the window is safe vs. legitimate sync flows.SecurityEvent::ConsensusInvalidMessage(the variantepoch_manager.rsalready uses);InvalidConsensusVotedoes not exist in the codebase.author_to_li_digestinsert is inside theNotEnoughVotesarm after the unknown-author filter — unknown authors are already rejected atUnverifiedEvent::verifyupstream, so this keeps the equivocation map clean.DuplicateVotevariant added — existingadd_signaturepath is already idempotent and the existingorder_vote_aggregationtest assertsVoteAddedfor the duplicate case.Test plan
cargo check --testspasses.cargo clippy --all-targets --all-features -- -D warningspasses (1 pre-existing warning inaptos-executor-types, unrelated).pending_order_votes::tests::order_vote_aggregationpasses — direct coverage of the modifiedinsert_order_vote.round_manager_test::*andproof_coordinator_test::test_proof_coordinator_basiccannot run locally — they hit pre-existingtodo!()stubs inmock_storage.rs:119andproof_coordinator_test.rs:65onorigin/main. Out of scope for this PR.MAX_ORDER_VOTE_ROUND_JUMP = 3is safe.Diff
4 files changed, 112 insertions(+), 12 deletions(-)
🤖 Generated with Claude Code