Skip to content

fix(consensus): three minimal audit fixes (#67, #41, #43)#732

Open
keanji-x wants to merge 2 commits into
mainfrom
kj/fix-audit-67-41-43
Open

fix(consensus): three minimal audit fixes (#67, #41, #43)#732
keanji-x wants to merge 2 commits into
mainfrom
kj/fix-audit-67-41-43

Conversation

@keanji-x
Copy link
Copy Markdown
Contributor

Summary

Combined PR — three minimal surgical fixes for gravity-audit findings in the vendored aptos-core consensus crate. Supersedes #726 and #727.

Commit 1 — gravity-audit#67 (Critical)

IncrementalProofState::take() panicked via unreachable!() when BLS signature aggregation failed (legitimately reachable after a validator-set rotation), crashing the ProofCoordinator task and halting consensus. Changed return type to Result<ProofOfStore, SignedBatchInfoError>, added SignedBatchInfoError::UnableToAggregate, propagated ? at the one caller in add_signature.

Commit 2 — gravity-audit#41 (Critical) + #43 (High)

  • fix: make chain id u64 in aptos #41: PendingOrderVotes had no per-author tracking, so a Byzantine validator could vote for conflicting LedgerInfo digests and have its voting power counted in each bucket. Added author_to_li_digest: HashMap<Author, HashValue> mirroring PendingVotes::insert_vote, equivocation detection with SecurityEvent::ConsensusEquivocatingVote log, GC of the new map alongside existing entries.
  • [enhancement]: make consensus crate pub and remove some unused import #43: process_order_vote_msg was the only message handler not calling ensure_round_and_sync_up. Since OrderVoteMsg carries no SyncInfo, added a snapshot of highest_ordered_round before new_qc_from_order_vote_msg runs, and a MAX_ORDER_VOTE_ROUND_JUMP = 3 guard rejecting order votes that jump past that window via the embedded QC.

Why minimal (not full upstream backports)

These PRs port only the security-bearing slices.

Notable deviations the reviewer should check

  • MAX_ORDER_VOTE_ROUND_JUMP = 3 — chosen to avoid breaking pipelined ordering; reviewer should confirm the window is safe vs. legitimate sync flows.
  • Round-jump log uses SecurityEvent::ConsensusInvalidMessage (the variant epoch_manager.rs already uses); InvalidConsensusVote does not exist in the codebase.
  • author_to_li_digest insert is inside the NotEnoughVotes arm after the unknown-author filter — unknown authors are already rejected at UnverifiedEvent::verify upstream, so this keeps the equivocation map clean.
  • No DuplicateVote variant added — existing add_signature path is already idempotent and the existing order_vote_aggregation test asserts VoteAdded for the duplicate case.

Test plan

  • cargo check --tests passes.
  • cargo clippy --all-targets --all-features -- -D warnings passes (1 pre-existing warning in aptos-executor-types, unrelated).
  • pending_order_votes::tests::order_vote_aggregation passes — direct coverage of the modified insert_order_vote.
  • Reviewer: round_manager_test::* and proof_coordinator_test::test_proof_coordinator_basic cannot run locally — they hit pre-existing todo!() stubs in mock_storage.rs:119 and proof_coordinator_test.rs:65 on origin/main. Out of scope for this PR.
  • Reviewer: confirm MAX_ORDER_VOTE_ROUND_JUMP = 3 is safe.

Diff

4 files changed, 112 insertions(+), 12 deletions(-)

🤖 Generated with Claude Code

keanji-x and others added 2 commits May 25, 2026 17:30
…d of panicking

Closes #67 (gravity-audit).

When BLS signature aggregation fails (e.g. after a validator-set rotation
that invalidates some collected per-signer keys), the old code hit
`unreachable!()` and crashed the `ProofCoordinator` tokio task, stalling
consensus on the affected node. This is a minimal surgical fix that
mirrors the relevant slice of upstream aptos-core PR #14644:

- Change `IncrementalProofState::take` to return
  `Result<ProofOfStore, SignedBatchInfoError>`.
- On aggregation failure, log via `error!` and return
  `SignedBatchInfoError::UnableToAggregate` (new variant) instead of
  panicking. Leave `completed = false` so a future attempt can retry.
- Propagate `?` at the single caller in `aggregate_and_verify`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ump guard in process_order_vote_msg

Closes #41 and #43 (gravity-audit).

#41: PendingOrderVotes had no per-author tracking, so a Byzantine
validator could submit order votes for multiple conflicting LedgerInfo
digests in the same round and have its voting power counted in each
bucket. Mirror the dedup pattern from PendingVotes:

- Add `author_to_li_digest: HashMap<Author, HashValue>` to PendingOrderVotes.
- In `insert_order_vote`, detect equivocation (same author, different digest)
  and return EquivocateVote with a SecurityEvent::ConsensusEquivocatingVote log.
- Garbage-collect author entries alongside their digest entries.

#43: process_order_vote_msg was the only message handler that didn't
call ensure_round_and_sync_up, so a Byzantine validator could ship an
OrderVoteMsg with a valid QC many rounds ahead of our local state,
fast-forward our round via the embedded QC's side-effect, then have the
attached order vote accepted for a round we never sequentially
processed. Since OrderVoteMsg carries no SyncInfo, the minimal
equivalent guard is to snapshot pre-state highest_ordered_round before
the QC insertion and reject the order vote if it would jump that
window by more than MAX_ORDER_VOTE_ROUND_JUMP.

This is a slice of upstream aptos-core PR #14637 ("Sync up QC in order
vote message"); we deliberately do NOT take the verified_quorum_cert
type refactor, which conflicts heavily with our tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant