consensus: cascading clock-skew enables liveness degradation and chain-time drift under current block-timestamp validation

## TL;DR

After the `TimestampValidationEpoch` hardfork, validators check
`parent.Time < header.Time <= now + 15s`. This bounds individual blocks but does
**not** bound how much chain time can be ratcheted forward of real wall-clock
time by a series of clock-skewed leaders. The result is a class of liveness and
view-change problems that are reachable today on any network where the fork is
active, and the situation is made easier by the removal of the NTP correction
in #5042.

> [!IMPORTANT]
> This is a consensus-layer issue. A fix will require a new hardfork epoch
> (sibling to `TimestampValidationEpoch`) to roll out safely.

---

## Background — current timestamp logic

### Header verification

`internal/chain/engine.go`:

```go
if chain.Config().IsTimestampValidation(header.Epoch()) {
 // Strict monotonic
 if header.Time().Cmp(parentHeader.Time()) <= 0 {
 return errors.New("timestamp older than parent")
 }
 // Wall-clock future ceiling (15s)
 limit := big.NewInt(time.Now().Add(allowedFutureBlockTime).Unix())
 if header.Time().Cmp(limit) > 0 {
 return engine.ErrFutureBlock
 }
}
```

with `allowedFutureBlockTime = 15 * time.Second`.

### Leader timestamp selection

`consensus/proposer.go` (after #5042 removed runtime NTP correction):

```go
now := time.Now()
timestamp := now.Unix()
parentTime := currentHeader.Time().Int64()

if timestamp <= parentTime {
 time.Sleep(time.Until(time.Unix(parentTime+1, 0)))
 continue
}
// otherwise: ProposeNewBlock(now, ...) -> header.Time = now.Unix()
```

The leader writes `header.Time = time.Now().Unix()` verbatim.

### View-change leader election uses block timestamp

`consensus/view_change.go::getNextViewID`:

```go
blockTimestamp := curHeader.Time().Int64()
curTimestamp := time.Now().Unix()

if curTimestamp <= blockTimestamp {
 // silent fallback to a non-deterministic algorithm
 return pm.fallbackNextViewID()
}
diff := uint64((curTimestamp-blockTimestamp)/viewChangeSlot + 1)
nextViewID := diff + stuckBlockViewID
```

with `viewChangeSlot = 45s`, `viewChangeDuration = 27s`.

---

## The problem — cascading clock skew

The `header.Time <= now + 15s` ceiling is reusable on **every** block. There is
no rule that bounds how much each block can advance chain time relative to its
parent.

### Failure scenario A — liveness loss for slow-clock validators

Let real network time be `T`, `allowedFutureBlockTime = 15s`.

1. Leader A has clock `+15s`. A proposes block N with `header.Time = T+15`.
2. Validator B with `+0s` skew checks `T+15 <= B.now + 15 = T+15` → ✅ accepts.
3. Validator C with `-1s` skew checks `T+15 <= C.now + 15 = T+14` → ❌ `ErrFutureBlock`.

C does not vote `prepare`. If enough validators have any non-positive skew,
the `prepare` quorum is short → 27 s `timeoutConsensus` fires → view change.

### Failure scenario B — ratcheted chain time, persistent honest stall

Continuing from A, suppose A's block did commit:

4. `parent.Time = T+15`.
5. Next leader E (honest, synced clock). E's `now = T+2` (one `BlockPeriod` after commit).
6. `proposer.go`: `timestamp (T+2) <= parentTime (T+15)`. E sleeps until
 `time.Unix(T+16, 0)` — **13 seconds of forced wait**.
7. E proposes `header.Time = T+16`.
8. Validator C (`-1s` clock) checks `T+16 <= C.now+15 = T+15` → ❌ still rejects.

The `+15s` skew has been **transferred from leader A into the chain itself**.
Validator C cannot participate until its own wall clock catches up to `T+15`.
Every subsequent leader pays a per-block sleep penalty roughly equal to the
accumulated skew.

### Failure scenario C — view-change determinism breaks

`getNextViewID` is the deterministic mechanism that lets validators agree on
the next leader without coordination. When `curTimestamp <= blockTimestamp`,
it silently falls back to `fallbackNextViewID()`, which computes the next
view ID from `viewChangingID` (non-deterministic across validators).

Under sustained cascading skew, validators with slow clocks systematically
take the fallback path while validators with fast clocks take the timestamp
path. They can pick **different next leaders** for the same view change,
requiring extra view-change rounds to converge.

### Failure scenario D — `proposer.go` indefinite sleep

```go
time.Sleep(time.Until(time.Unix(parentTime+1, 0)))
```

There is no upper bound on the sleep. If `parent.Time` is far in the future
(legitimate cascading skew, or a misconfigured leader), the proposer goroutine
sleeps indefinitely. `consensus.readySignal` is an unbuffered channel:

```go
consensus.readySignal = make(chan Proposal)
```

Anyone calling `ReadySignal` (view-change completion, `setupForNewConsensus`,
`finalCommit`) blocks during the sleep, which can stall view-change recovery
on top of the original stall.

---

## Impact surface

| Area | Affected | Mechanism |
|---|---|---|
| Consensus liveness | Yes | `ErrFutureBlock` during `BlockVerifier` drops slow-clock validators out of the `prepare` quorum. |
| View changes | Yes | `getNextViewID` falls back to non-deterministic algorithm; different validators elect different next leaders. |
| Leader rotation | Indirect | `NthNext*` is computed over diverging view IDs. |
| Epoch transitions | Indirect | Stretched gap between last block of epoch and first block of next epoch delays staking-reward calc, shard-state propagation, crosslink injection. |
| Transaction processing | Yes | `block.timestamp` EVM opcode visible to contracts; `AsyncBlockProposalTimeout = 9s` can fire during proposer sleep. |
| Block import / sync | Yes | `ErrFutureBlock` halts the insert batch (`core/blockchain_impl.go::insertChain`). Sync peers get stuck on the offending block until their clock catches up. |
| Crosslinks | Indirect | Beacon chain crosslink confirmation delayed by stalled shard chain. |
| Fast finality (1 s blocks) | Acute | With `BlockPeriod = 1s` and `allowedFutureBlockTime = 15s`, a single `+15s` leader locks the chain ahead of real time by 15 block-periods. |

---

## Attack model

| Attacker capability | Today |
|---|---|
| Set own clock `+15s` while leader | Pushes chain time `+15s`; validators with normal clock vote; validators with `-ε` clock drop out. No cost to attacker. |
| Set own clock `-15s` while leader | Block has stale timestamp; still accepted (proposer sleeps to make `header.Time > parent.Time`). |
| Repeat across leader slots | Chain time persistently runs ahead of real time at the `+15s` ceiling; honest leaders pay a per-attack stall cost. |
| Drive validators with `-ε` clocks out of quorum | Achievable by a single `+15s` leader if any slow validators exist. |

There is no slashing for any of these behaviors. Cost to attacker = 0;
cost to network = sustained skew + intermittent view changes + smart-contract
`block.timestamp` drift.

---

## Constraints any fix must respect

1. **Deterministic across validators.** Any rule that depends on each
 validator's `time.Now()` is OK only if it is robust to ±1–2 s wall-clock
 skew between honest validators.
2. **Survives legitimate stalls.** Mainnet can stall for several minutes
 (view-change storms, partitions). After the stall, chain time must be able
 to re-align with real time within a small number of blocks — ideally one.
3. **No runtime NTP dependency.** #5042 deliberately removed runtime NTP
 correction. A fix should work with raw `time.Now()`.
4. **Hardfork-gated.** Header-acceptance rule changes need a new fork epoch
 field (sibling to `TimestampValidationEpoch`) so rollout is coordinated.
5. **Backward compatible with `getNextViewID`** — or, if not, fix that function
 in the same hardfork.

---

## Possible approaches

<details>
<summary>Approach 1 — Parent-aware skew "carry" &nbsp;(not recommended)</summary>

`limit = max(time.Now()+15s, parent.Time + 15s)`.

- ✅ Fixes liveness for slow-clock validators.
- ❌ Removes the only existing protection against unbounded chain-time drift.
 Each block can compound `+15s`. Chain time can race arbitrarily far ahead of
 real time.

</details>

<details>
<summary>Approach 2 — Bounded per-block forward step</summary>

Add a step rule on top of the existing wall-clock ceiling:

```
header.Time <= parent.Time + maxStep AND header.Time <= now + 15s
```

with `maxStep` in the 10–30 s range.

- ✅ Mathematically bounds cascading skew at `maxStep` per block.
- ✅ Honest leaders with synced clocks never trip it.
- ⚠️ Stall recovery is slow: chain time catches up at `maxStep` per block.
 A 5-minute stall takes `300/maxStep` blocks to re-align `block.timestamp`
 with real time.

</details>

<details open>
<summary>Approach 3 — Bounded step with wall-clock fallback &nbsp;(recommended)</summary>

```
header.Time <= max(parent.Time + maxStep, now) AND header.Time <= now + 15s
```

- In normal operation (`parent ≈ now − BlockPeriod`), `parent + maxStep > now`,
 so the step arm is binding. Cascading skew is bounded.
- In stall recovery (`parent << now`), `now > parent + maxStep`, so the wall
 arm is binding. The block can jump straight to `now` — single-block recovery.
- Leader-side mirror: clamp proposed timestamp to `parent + maxStep` only when
 wall clock is moderately ahead of parent (clock skew). Above a threshold
 (e.g. `viewChangeTimeout + buffer = 30s`) treat as stall and produce at wall.

</details>

<details>
<summary>Approach 4 — View-ID-aware step</summary>

Use `header.ViewID - parent.ViewID` (deterministic from header fields) to
compute an allowed step. Each missed view (view-change round) extends the
allowed step by `viewChangeTimeout`.

- ✅ Fully deterministic across validators.
- ✅ Handles arbitrary-length stalls in a single block.
- ❌ More invasive change; couples engine validation to viewID semantics.

</details>

<details>
<summary>Approach 5 — Cap proposer catch-up sleep &nbsp;(complementary)</summary>

Independent of the validation rule: cap the `time.Sleep` in `proposer.go` so
the proposer goroutine cannot be hung indefinitely on a far-future parent.
Doesn't fix the underlying skew but removes the secondary blocking of
`readySignal`.

</details>

### Suggested combination

Approach **3** (bounded step with wall-clock fallback) + Approach **5** (cap
proposer sleep) under a new hardfork field `BoundedTimestampStepEpoch`. Also
fix `getNextViewID` to clamp `curTimestamp = max(time.Now().Unix(), parent.Time)`
so the deterministic path is always taken.

---

## Open questions

- [ ] What `maxStep` value? Candidates: `2 * BlockPeriod + 1s` (per-epoch), or
 a network-wide constant (e.g. 12 s) larger than the largest `BlockPeriod`.
- [ ] What stall-detection threshold for the proposer's clamp-vs-no-clamp
 decision? `viewChangeTimeout (27s) + 3s = 30s` is the natural choice.
- [ ] Re-introduce NTP correction *for proposal only* (not validation) so
 honest leaders track real time more tightly? Or accept that with bounded
 step NTP is not strictly necessary?
- [ ] Activation strategy — new epoch field, or piggyback on a future
 `TimestampValidationEpoch` activation on networks where it hasn't fired
 yet? (Currently only Partner/Devnet has activated it.)

---

## References

- #5020 — Add timestamp checks for block header validation (introduced the current rule)
- #5028 — NTP offset updates for block timestamps (added, then later removed)
- #5042 — Remove NTP configurations and functionality

Relevant files:

- `internal/chain/engine.go` — `VerifyHeader`
- `consensus/proposer.go` — `WaitForConsensusReadyV2`
- `consensus/view_change.go` — `getNextViewID`, `fallbackNextViewID`
- `consensus/config.go` — `viewChangeTimeout`, `viewChangeSlot`
- `core/blockchain_impl.go` — `insertChain` (`ErrFutureBlock` handling)
- `internal/params/config.go` — `TimestampValidationEpoch`

---

## Reproducer (sketch)

A small two-node localnet harness that wraps `time.Now()` with a configurable
offset should be sufficient to reproduce **Scenario A** on current `main` with
`TimestampValidationEpoch` active (Partner network or localnet). A scripted
skew schedule across a 4-node committee reproduces **Scenarios B and C**.
Happy to author a reproducer harness as a follow-up PR if useful.

---

## Acceptance criteria for a fix

- [ ] New hardfork field gating the new rule (no behavior change before activation).
- [ ] Cascading skew bounded at a small constant per block (≤ `2 × BlockPeriod`).
- [ ] Single-block recovery from arbitrarily long stalls.
- [ ] Proposer goroutine never blocked > N seconds on a far-future parent (configurable cap).
- [ ] `getNextViewID` stays on the deterministic timestamp path.
- [ ] Unit tests cover: monotonic, wall-clock ceiling, step boundary, stall
 recovery (short/medium/long), cascading-skew aftermath, unknown ancestor.
- [ ] Integration test on a multi-node localnet with injected clock skew.

Attacker capability	Today
Set own clock `+15s` while leader	Pushes chain time `+15s`; validators with normal clock vote; validators with `-ε` clock drop out. No cost to attacker.
Set own clock `-15s` while leader	Block has stale timestamp; still accepted (proposer sleeps to make `header.Time > parent.Time`).
Repeat across leader slots	Chain time persistently runs ahead of real time at the `+15s` ceiling; honest leaders pay a per-attack stall cost.
Drive validators with `-ε` clocks out of quorum	Achievable by a single `+15s` leader if any slow validators exist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consensus: cascading clock-skew enables liveness degradation and chain-time drift under current block-timestamp validation #5047

TL;DR

Background — current timestamp logic

Header verification

Leader timestamp selection

View-change leader election uses block timestamp

The problem — cascading clock skew

Failure scenario A — liveness loss for slow-clock validators

Failure scenario B — ratcheted chain time, persistent honest stall

Failure scenario C — view-change determinism breaks

Failure scenario D — `proposer.go` indefinite sleep

Impact surface

Attack model

Constraints any fix must respect

Possible approaches

Suggested combination

Open questions

References

Reproducer (sketch)

Acceptance criteria for a fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Area	Affected	Mechanism
Consensus liveness	Yes	`ErrFutureBlock` during `BlockVerifier` drops slow-clock validators out of the `prepare` quorum.
View changes	Yes	`getNextViewID` falls back to non-deterministic algorithm; different validators elect different next leaders.
Leader rotation	Indirect	`NthNext*` is computed over diverging view IDs.
Epoch transitions	Indirect	Stretched gap between last block of epoch and first block of next epoch delays staking-reward calc, shard-state propagation, crosslink injection.
Transaction processing	Yes	`block.timestamp` EVM opcode visible to contracts; `AsyncBlockProposalTimeout = 9s` can fire during proposer sleep.
Block import / sync	Yes	`ErrFutureBlock` halts the insert batch (`core/blockchain_impl.go::insertChain`). Sync peers get stuck on the offending block until their clock catches up.
Crosslinks	Indirect	Beacon chain crosslink confirmation delayed by stalled shard chain.
Fast finality (1 s blocks)	Acute	With `BlockPeriod = 1s` and `allowedFutureBlockTime = 15s`, a single `+15s` leader locks the chain ahead of real time by 15 block-periods.

consensus: cascading clock-skew enables liveness degradation and chain-time drift under current block-timestamp validation #5047

Description

TL;DR

Background — current timestamp logic

Header verification

Leader timestamp selection

View-change leader election uses block timestamp

The problem — cascading clock skew

Failure scenario A — liveness loss for slow-clock validators

Failure scenario B — ratcheted chain time, persistent honest stall

Failure scenario C — view-change determinism breaks

Failure scenario D — proposer.go indefinite sleep

Impact surface

Attack model

Constraints any fix must respect

Possible approaches

Suggested combination

Open questions

References

Reproducer (sketch)

Acceptance criteria for a fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Failure scenario D — `proposer.go` indefinite sleep