Potential bug in BlockRefactor V0 syncBlock function #6173
Description
Summary of Bug
In tendermint/blockchain/v0/refactor.go: poolRoutine() function, if we get some dirty blocks from other peers when sync blocks, blockExec.ValidateBlock(state, block)
will return error and the panic(fmt.Sprintf("Failed to process committed block (%d:%X): %v", first.Height, first.Hash(), err))
will make our node to be a zombine。
...
first, second := bcR.pool.PeekTwoBlocks()
//bcR.Logger.Info("TrySync peeked", "first", first, "second", second)
if first == nil || second == nil {
// We need both to sync the first block.
continue FOR_LOOP
} else {
// Try again quickly next loop.
didProcessCh <- struct{}{}
}
...
err := state.Validators.VerifyCommit(
chainID, firstID, first.Height, second.LastCommit)
...
// TODO: batch saves so we dont persist to disk every block
bcR.store.SaveBlock(first, firstParts, second.LastCommit)
// TODO: same thing for app - but we would need a way to
// get the hash without persisting the state
var err error
state, _, err = bcR.blockExec.ApplyBlock(state, firstID, first)
if err != nil {
// TODO This is bad, are we zombie?
panic(fmt.Sprintf("Failed to process committed block (%d:%X): %v", first.Height, first.Hash(), err))
}
blocksSynced++
...
If we get a dirty block from other peer, we will save it to our db without check. When we run bcR.blockExec.ApplyBlock
, an error will return and out node will panic and exit. If we restart it, it will run replayBlocks and still will panic when run ApplyBlock
.
Maybe ,we should do the BlockCheck after
err := state.Validators.VerifyCommit(chainID, firstID, first.Height, second.LastCommit)
to avoid the dirty block from malicious node!
Version
cosmossdk: 0.37.9
tendermint: 0.32.10
Steps to Reproduce
Our cluster run on 7 machine with docker-compose, restart 3 nodes on one machine, It will run into
`
panic: Failed to process committed block (190752:57B2434ACB1BEDA4FD41537A1992E7C64C10DB320415AA289A50A5059FA317A3): Wrong Block.Header.AppHash. Expected 3A8EC21DC8205C41CBDBEA42C4F2C91ED482DAE131110C97FD0EA5426A75791B, got 8E7D8AD3F70B16A1F7E5D50EC0FA01DC7E3AA365C08DBA527A7466D5530187AC
goroutine 133 [running]:
github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc000dbc1a0)
/go/pkg/mod/github.com/okex/tendermint@v0.32.10-green/blockchain/v0/reactor.go:344 +0x1542
created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart
/go/pkg/mod/github.com/okex/tendermint@v0.32.10-green/blockchain/v0/reactor.go:118 +0x84
`
I don't know why they get a dity block after restart, but it do got one and store in their local db.
For Admin Use
- Not duplicate issue
- Appropriate labels applied
- Appropriate contributors tagged
- Contributor assigned/self-assigned
Activity