Skip to content

consensus: cascading clock-skew enables liveness degradation and chain-time drift under current block-timestamp validation #5047

@GheisMohammadi

Description

@GheisMohammadi

TL;DR

After the TimestampValidationEpoch hardfork, validators check
parent.Time < header.Time <= now + 15s. This bounds individual blocks but does
not bound how much chain time can be ratcheted forward of real wall-clock
time by a series of clock-skewed leaders. The result is a class of liveness and
view-change problems that are reachable today on any network where the fork is
active, and the situation is made easier by the removal of the NTP correction
in #5042.

Important

This is a consensus-layer issue. A fix will require a new hardfork epoch
(sibling to TimestampValidationEpoch) to roll out safely.


Background — current timestamp logic

Header verification

internal/chain/engine.go:

if chain.Config().IsTimestampValidation(header.Epoch()) {
    // Strict monotonic
    if header.Time().Cmp(parentHeader.Time()) <= 0 {
        return errors.New("timestamp older than parent")
    }
    // Wall-clock future ceiling (15s)
    limit := big.NewInt(time.Now().Add(allowedFutureBlockTime).Unix())
    if header.Time().Cmp(limit) > 0 {
        return engine.ErrFutureBlock
    }
}

with allowedFutureBlockTime = 15 * time.Second.

Leader timestamp selection

consensus/proposer.go (after #5042 removed runtime NTP correction):

now        := time.Now()
timestamp  := now.Unix()
parentTime := currentHeader.Time().Int64()

if timestamp <= parentTime {
    time.Sleep(time.Until(time.Unix(parentTime+1, 0)))
    continue
}
// otherwise: ProposeNewBlock(now, ...) -> header.Time = now.Unix()

The leader writes header.Time = time.Now().Unix() verbatim.

View-change leader election uses block timestamp

consensus/view_change.go::getNextViewID:

blockTimestamp := curHeader.Time().Int64()
curTimestamp   := time.Now().Unix()

if curTimestamp <= blockTimestamp {
    // silent fallback to a non-deterministic algorithm
    return pm.fallbackNextViewID()
}
diff       := uint64((curTimestamp-blockTimestamp)/viewChangeSlot + 1)
nextViewID := diff + stuckBlockViewID

with viewChangeSlot = 45s, viewChangeDuration = 27s.


The problem — cascading clock skew

The header.Time <= now + 15s ceiling is reusable on every block. There is
no rule that bounds how much each block can advance chain time relative to its
parent.

Failure scenario A — liveness loss for slow-clock validators

Let real network time be T, allowedFutureBlockTime = 15s.

  1. Leader A has clock +15s. A proposes block N with header.Time = T+15.
  2. Validator B with +0s skew checks T+15 <= B.now + 15 = T+15 → ✅ accepts.
  3. Validator C with -1s skew checks T+15 <= C.now + 15 = T+14 → ❌ ErrFutureBlock.

C does not vote prepare. If enough validators have any non-positive skew,
the prepare quorum is short → 27 s timeoutConsensus fires → view change.

Failure scenario B — ratcheted chain time, persistent honest stall

Continuing from A, suppose A's block did commit:

  1. parent.Time = T+15.
  2. Next leader E (honest, synced clock). E's now = T+2 (one BlockPeriod after commit).
  3. proposer.go: timestamp (T+2) <= parentTime (T+15). E sleeps until
    time.Unix(T+16, 0)13 seconds of forced wait.
  4. E proposes header.Time = T+16.
  5. Validator C (-1s clock) checks T+16 <= C.now+15 = T+15 → ❌ still rejects.

The +15s skew has been transferred from leader A into the chain itself.
Validator C cannot participate until its own wall clock catches up to T+15.
Every subsequent leader pays a per-block sleep penalty roughly equal to the
accumulated skew.

Failure scenario C — view-change determinism breaks

getNextViewID is the deterministic mechanism that lets validators agree on
the next leader without coordination. When curTimestamp <= blockTimestamp,
it silently falls back to fallbackNextViewID(), which computes the next
view ID from viewChangingID (non-deterministic across validators).

Under sustained cascading skew, validators with slow clocks systematically
take the fallback path while validators with fast clocks take the timestamp
path. They can pick different next leaders for the same view change,
requiring extra view-change rounds to converge.

Failure scenario D — proposer.go indefinite sleep

time.Sleep(time.Until(time.Unix(parentTime+1, 0)))

There is no upper bound on the sleep. If parent.Time is far in the future
(legitimate cascading skew, or a misconfigured leader), the proposer goroutine
sleeps indefinitely. consensus.readySignal is an unbuffered channel:

consensus.readySignal = make(chan Proposal)

Anyone calling ReadySignal (view-change completion, setupForNewConsensus,
finalCommit) blocks during the sleep, which can stall view-change recovery
on top of the original stall.


Impact surface

Area Affected Mechanism
Consensus liveness Yes ErrFutureBlock during BlockVerifier drops slow-clock validators out of the prepare quorum.
View changes Yes getNextViewID falls back to non-deterministic algorithm; different validators elect different next leaders.
Leader rotation Indirect NthNext* is computed over diverging view IDs.
Epoch transitions Indirect Stretched gap between last block of epoch and first block of next epoch delays staking-reward calc, shard-state propagation, crosslink injection.
Transaction processing Yes block.timestamp EVM opcode visible to contracts; AsyncBlockProposalTimeout = 9s can fire during proposer sleep.
Block import / sync Yes ErrFutureBlock halts the insert batch (core/blockchain_impl.go::insertChain). Sync peers get stuck on the offending block until their clock catches up.
Crosslinks Indirect Beacon chain crosslink confirmation delayed by stalled shard chain.
Fast finality (1 s blocks) Acute With BlockPeriod = 1s and allowedFutureBlockTime = 15s, a single +15s leader locks the chain ahead of real time by 15 block-periods.

Attack model

Attacker capability Today
Set own clock +15s while leader Pushes chain time +15s; validators with normal clock vote; validators with clock drop out. No cost to attacker.
Set own clock -15s while leader Block has stale timestamp; still accepted (proposer sleeps to make header.Time > parent.Time).
Repeat across leader slots Chain time persistently runs ahead of real time at the +15s ceiling; honest leaders pay a per-attack stall cost.
Drive validators with clocks out of quorum Achievable by a single +15s leader if any slow validators exist.

There is no slashing for any of these behaviors. Cost to attacker = 0;
cost to network = sustained skew + intermittent view changes + smart-contract
block.timestamp drift.


Constraints any fix must respect

  1. Deterministic across validators. Any rule that depends on each
    validator's time.Now() is OK only if it is robust to ±1–2 s wall-clock
    skew between honest validators.
  2. Survives legitimate stalls. Mainnet can stall for several minutes
    (view-change storms, partitions). After the stall, chain time must be able
    to re-align with real time within a small number of blocks — ideally one.
  3. No runtime NTP dependency. Remove NTP configurations and functionality #5042 deliberately removed runtime NTP
    correction. A fix should work with raw time.Now().
  4. Hardfork-gated. Header-acceptance rule changes need a new fork epoch
    field (sibling to TimestampValidationEpoch) so rollout is coordinated.
  5. Backward compatible with getNextViewID — or, if not, fix that function
    in the same hardfork.

Possible approaches

Approach 1 — Parent-aware skew "carry"  (not recommended)

limit = max(time.Now()+15s, parent.Time + 15s).

  • ✅ Fixes liveness for slow-clock validators.
  • ❌ Removes the only existing protection against unbounded chain-time drift.
    Each block can compound +15s. Chain time can race arbitrarily far ahead of
    real time.
Approach 2 — Bounded per-block forward step

Add a step rule on top of the existing wall-clock ceiling:

header.Time <= parent.Time + maxStep   AND   header.Time <= now + 15s

with maxStep in the 10–30 s range.

  • ✅ Mathematically bounds cascading skew at maxStep per block.
  • ✅ Honest leaders with synced clocks never trip it.
  • ⚠️ Stall recovery is slow: chain time catches up at maxStep per block.
    A 5-minute stall takes 300/maxStep blocks to re-align block.timestamp
    with real time.
Approach 3 — Bounded step with wall-clock fallback  (recommended)
header.Time <= max(parent.Time + maxStep, now)   AND   header.Time <= now + 15s
  • In normal operation (parent ≈ now − BlockPeriod), parent + maxStep > now,
    so the step arm is binding. Cascading skew is bounded.
  • In stall recovery (parent << now), now > parent + maxStep, so the wall
    arm is binding. The block can jump straight to now — single-block recovery.
  • Leader-side mirror: clamp proposed timestamp to parent + maxStep only when
    wall clock is moderately ahead of parent (clock skew). Above a threshold
    (e.g. viewChangeTimeout + buffer = 30s) treat as stall and produce at wall.
Approach 4 — View-ID-aware step

Use header.ViewID - parent.ViewID (deterministic from header fields) to
compute an allowed step. Each missed view (view-change round) extends the
allowed step by viewChangeTimeout.

  • ✅ Fully deterministic across validators.
  • ✅ Handles arbitrary-length stalls in a single block.
  • ❌ More invasive change; couples engine validation to viewID semantics.
Approach 5 — Cap proposer catch-up sleep  (complementary)

Independent of the validation rule: cap the time.Sleep in proposer.go so
the proposer goroutine cannot be hung indefinitely on a far-future parent.
Doesn't fix the underlying skew but removes the secondary blocking of
readySignal.

Suggested combination

Approach 3 (bounded step with wall-clock fallback) + Approach 5 (cap
proposer sleep) under a new hardfork field BoundedTimestampStepEpoch. Also
fix getNextViewID to clamp curTimestamp = max(time.Now().Unix(), parent.Time)
so the deterministic path is always taken.


Open questions

  • What maxStep value? Candidates: 2 * BlockPeriod + 1s (per-epoch), or
    a network-wide constant (e.g. 12 s) larger than the largest BlockPeriod.
  • What stall-detection threshold for the proposer's clamp-vs-no-clamp
    decision? viewChangeTimeout (27s) + 3s = 30s is the natural choice.
  • Re-introduce NTP correction for proposal only (not validation) so
    honest leaders track real time more tightly? Or accept that with bounded
    step NTP is not strictly necessary?
  • Activation strategy — new epoch field, or piggyback on a future
    TimestampValidationEpoch activation on networks where it hasn't fired
    yet? (Currently only Partner/Devnet has activated it.)

References

Relevant files:

  • internal/chain/engine.goVerifyHeader
  • consensus/proposer.goWaitForConsensusReadyV2
  • consensus/view_change.gogetNextViewID, fallbackNextViewID
  • consensus/config.goviewChangeTimeout, viewChangeSlot
  • core/blockchain_impl.goinsertChain (ErrFutureBlock handling)
  • internal/params/config.goTimestampValidationEpoch

Reproducer (sketch)

A small two-node localnet harness that wraps time.Now() with a configurable
offset should be sufficient to reproduce Scenario A on current main with
TimestampValidationEpoch active (Partner network or localnet). A scripted
skew schedule across a 4-node committee reproduces Scenarios B and C.
Happy to author a reproducer harness as a follow-up PR if useful.


Acceptance criteria for a fix

  • New hardfork field gating the new rule (no behavior change before activation).
  • Cascading skew bounded at a small constant per block (≤ 2 × BlockPeriod).
  • Single-block recovery from arbitrarily long stalls.
  • Proposer goroutine never blocked > N seconds on a far-future parent (configurable cap).
  • getNextViewID stays on the deterministic timestamp path.
  • Unit tests cover: monotonic, wall-clock ceiling, step boundary, stall
    recovery (short/medium/long), cascading-skew aftermath, unknown ancestor.
  • Integration test on a multi-node localnet with injected clock skew.

Metadata

Metadata

Assignees

No one assigned

    Labels

    designDesign and architectural plans/issues

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions