Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clique : Discarded bad propagated block#1 when syncing #14945

Closed
mawenpeng opened this issue Aug 9, 2017 · 9 comments
Closed

Clique : Discarded bad propagated block#1 when syncing #14945

mawenpeng opened this issue Aug 9, 2017 · 9 comments
Assignees

Comments

@mawenpeng
Copy link

System information

Geth Version: 1.6.7-stable
OS & Version: CentOS Linux release 7.3.1611
Architecture: amd64
Protocol Versions: [63 62]
Go Version: go1.8.3

Expected behaviour

Sync block#1 successfully

Actual behaviour

Discarded block#1

Steps to reproduce the behaviour

  1. Init a private chain of 4 nodes with Clique consensus engine, 3 nodes to generate blocks, 1 more node to sync blocks only.
  2. Import private keys to the 3 nodes and unlock accounts.
  3. Add peers to make all 4 nodes connected to each other.
  4. Start mining on 1 node, will see warnings on other nodes, saying "Discarded bad propagated block number=1 hash=0ee7bf…0adaa0".

Not sure if this occurs with POW.
It may also happen with 2 or 3 nodes.
If add peers after starting mining on 1 node, other nodes will sync block#1 successfully.

@mawenpeng mawenpeng changed the title Clique block#1: Discarded bad propagated block when syncing Clique : Discarded bad propagated block#1 when syncing Aug 9, 2017
@karalabe karalabe self-assigned this Aug 9, 2017
@joeb000
Copy link

joeb000 commented Aug 23, 2017

I am having a similar issue with my multi-node PoA setup. I only have a single miner, mining interval set to 1 second, and the "bitchin tricks" mining hack is in place (only mine when there is a tx in the pool).

I send a transaction from a non-mining node, it propagates, and a block is subsequently mined with the transaction in it. Everything actually works great except for the two WARN messages I am seeing in the console:

WARN [08-23|10:44:56] Discarded bad propagated block           number=10 hash=8810d8…d993f0
INFO [08-23|10:44:56] Imported new state entries               count=1   flushed=0 elapsed=172.953µs    processed=20 pending=4  retry=0 duplicate=0 unexpected=0
INFO [08-23|10:44:56] Imported new state entries               count=3   flushed=0 elapsed=243.195µs    processed=23 pending=7  retry=0 duplicate=0 unexpected=0
INFO [08-23|10:44:56] Imported new block headers               count=2   elapsed=2.640ms      number=11 hash=18316a…550c90 ignored=9
INFO [08-23|10:44:56] Imported new state entries               count=3   flushed=4 elapsed=96.093µs     processed=26 pending=4  retry=0 duplicate=0 unexpected=0
INFO [08-23|10:44:56] Imported new chain segment               blocks=2 txs=3 mgas=0.021 elapsed=885.425µs    mgasps=23.717 number=11 hash=18316a…550c90 ignored=4
WARN [08-23|10:44:56] Synchronisation failed, retrying         err="state data download canceled (requested)"

Even though it says Synchronisation fails, it actually didnt. The block propagated successfully and the transaction was processed.

@sirnicolas21
Copy link

i am having same issue, with 3 signer/mining nodes and one node without mining that gives transactions, one of the mining nodes when a transaction is mined gives a "bad block unknown ancestor" and stops mining, whole network stucks after that

version 1.7.0 on commit (#14631)

@amissine
Copy link

amissine commented Sep 9, 2017

I just tried it with the latest commit 10181b5 - PoW does not have this problem, works like charm. Must be a PoA issue.

@OniReimu
Copy link

OniReimu commented Oct 14, 2017

From my understanding:

func NewProtocolManager(config *params.ChainConfig, mode downloader.SyncMode, networkId uint64, maxPeers int, mux *event.TypeMux, txpool txPool, engine consensus.Engine, blockchain *core.BlockChain, chaindb ethdb.Database) (*ProtocolManager, error) {
// ...
	// Figure out whether to allow fast sync or not
	if mode == downloader.FastSync && blockchain.CurrentBlock().NumberU64() > 0 {
		log.Warn("Blockchain not empty, fast sync disabled")
		mode = downloader.FullSync
	}
	if mode == downloader.FastSync {
		manager.fastSync = uint32(1)
	}
// ...
	inserter := func(blocks types.Blocks) (int, error) {
		// If fast sync is running, deny importing weird blocks
		if atomic.LoadUint32(&manager.fastSync) == 1 {
			log.Warn("Discarded bad propagated block", "number", blocks[0].Number(), "hash", blocks[0].Hash())
			return 0, nil
		}
		atomic.StoreUint32(&manager.acceptTxs, 1) // Mark initial sync done on any fetcher import
		return manager.blockchain.InsertChain(blocks)
	}
//...
}

When the PM is being created, the mode is controlled by DefaultConfig defined in eth/config.go where the default mode is FastSync.

// DefaultConfig contains default settings for use on the Ethereum main net.
var DefaultConfig = Config{
	SyncMode:             downloader.FastSync,
	EthashCacheDir:       "ethash",
	EthashCachesInMem:    2,
	EthashCachesOnDisk:   3,
	EthashDatasetsInMem:  1,
	EthashDatasetsOnDisk: 2,
	NetworkId:            1,
	LightPeers:           20,
	DatabaseCache:        128,
	GasPrice:             big.NewInt(18 * params.Shannon),

	TxPool: core.DefaultTxPoolConfig,
	GPO: gasprice.Config{
		Blocks:     10,
		Percentile: 50,
	},
}

Note that the calling operation is:

  1. Storing eth/backend.go - New() in []serviceFuncs:
    cmd/geth/main.go - geth() -> cmd/geth/config.go - makeFullNode() -> cmd/utils/flags.go - RegisterEthService() -> node/node.go - Register()
  2. Run eth/backend.go - New():
    cmd/geth/main.go - geth() -> startNode() -> cmd/utils/cmd.go - StartNode() -> node/node.go - start() -> eth/backend.go - New() -> eth/handler.go - NewProtocolManager()

That is, if the current block height is 0, then FastSync is activated. Thus when inserter is being called, log.Warn("Discarded bad propagated block", "number", blocks[0].Number(), "hash", blocks[0].Hash()) will be called.

@facundomedica
Copy link

facundomedica commented Mar 1, 2018

Any updates on this?
EDIT: I think I figured out.

Check:

  1. That you have an "random enough" networkId
  2. Set --nodiscover flag. And add your nodes manually.

Reason: Some peers appeared when running admin.peers but I didn't recognize their IPs (I was running only 2 nodes locally), and they were from somewhere in the USA. So maybe you are getting connections from other networks and they interfere with yours.

@REPTILEHAUS
Copy link

Also hitting this issue... running 2 nodes and 4 signing nodes, running locally with different ports and connected via admin.addPeers() - usually works with just 2 nodes and 2 signers

@facundomedica
Copy link

@REPTILEHAUS have you tried what is in my comments? When you list the peers, are all of them well known to you?

@christiankiller
Copy link

christiankiller commented Mar 20, 2018

We also ran into many issues; we changed many things along the way, maybe something might solve your issue:

  1. Take a look at this repository for local deployment and this repository if you plan to run it on the cloud
  2. Only add two sealers to extradata in your genesis.json (e.g. if you have 5 total nodes, only authorize 2 signers in extradata. (also check EIP225)
  3. Add sleep between starting miner.start() on the two authorized nodes; somehow they raced and blocked each other when started too close together
  4. Add --syncmode "full" to your geth command

@stale
Copy link

stale bot commented Mar 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot closed this as completed Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants