Skip to content

Potential bug in BlockRefactor V0 syncBlock function #6173

Closed
@louisliu2048

Description

Summary of Bug

In tendermint/blockchain/v0/refactor.go: poolRoutine() function, if we get some dirty blocks from other peers when sync blocks, blockExec.ValidateBlock(state, block) will return error and the panic(fmt.Sprintf("Failed to process committed block (%d:%X): %v", first.Height, first.Hash(), err)) will make our node to be a zombine。

...
first, second := bcR.pool.PeekTwoBlocks()
//bcR.Logger.Info("TrySync peeked", "first", first, "second", second)
if first == nil || second == nil {
	// We need both to sync the first block.
	continue FOR_LOOP
} else {
	// Try again quickly next loop.
	didProcessCh <- struct{}{}
}
...

err := state.Validators.VerifyCommit(
	chainID, firstID, first.Height, second.LastCommit)
...

// TODO: batch saves so we dont persist to disk every block
bcR.store.SaveBlock(first, firstParts, second.LastCommit)

// TODO: same thing for app - but we would need a way to
// get the hash without persisting the state
var err error
state, _, err = bcR.blockExec.ApplyBlock(state, firstID, first)
if err != nil {
	// TODO This is bad, are we zombie?
	panic(fmt.Sprintf("Failed to process committed block (%d:%X): %v", first.Height, first.Hash(), err))
}
blocksSynced++
...

If we get a dirty block from other peer, we will save it to our db without check. When we run bcR.blockExec.ApplyBlock, an error will return and out node will panic and exit. If we restart it, it will run replayBlocks and still will panic when run ApplyBlock.

Maybe ,we should do the BlockCheck after
err := state.Validators.VerifyCommit(chainID, firstID, first.Height, second.LastCommit)
to avoid the dirty block from malicious node!

Version

cosmossdk: 0.37.9
tendermint: 0.32.10

Steps to Reproduce

Our cluster run on 7 machine with docker-compose, restart 3 nodes on one machine, It will run into
`
panic: Failed to process committed block (190752:57B2434ACB1BEDA4FD41537A1992E7C64C10DB320415AA289A50A5059FA317A3): Wrong Block.Header.AppHash. Expected 3A8EC21DC8205C41CBDBEA42C4F2C91ED482DAE131110C97FD0EA5426A75791B, got 8E7D8AD3F70B16A1F7E5D50EC0FA01DC7E3AA365C08DBA527A7466D5530187AC

goroutine 133 [running]:
github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc000dbc1a0)
/go/pkg/mod/github.com/okex/tendermint@v0.32.10-green/blockchain/v0/reactor.go:344 +0x1542
created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart
/go/pkg/mod/github.com/okex/tendermint@v0.32.10-green/blockchain/v0/reactor.go:118 +0x84
`
I don't know why they get a dity block after restart, but it do got one and store in their local db.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions