-
Notifications
You must be signed in to change notification settings - Fork 1.7k
PoA network partition upon change of contract-based ValidatorSet #10306
Comments
(cc @HCastano - lots of stuff to dig through) |
I can reproduce the issue without a validator contract in place. When switching the configuration at a future block
I get the same errors and the chain breaks into several partitions.
Had the issue both at block #101100 and #10185; strangely it was always the first validator node in the array that reported a mismatch. cc @joshua-mir |
@mathiasfrey Would you be able to share the code you're using to set up your network? |
@mathiasfrey How come there's only node in your |
@HCastano , my nodes are spread across the globe. I checked out my code on different machines with public IPs as this represents the target setup. In this specific setup I had 3 machines with two instances each distinguished by port (e.g. you can see the same IP but two different ports in the bootnodes):
|
I've managed to replicate this locally with two nodes running v2.3.2. In the chain spec I set up one node to be an authority from block 0, and the other to get added to the validator set at a future block. When that block was hit (if you were doing this with a contract you'd need to wait for the block to be finalized instead of simply reached) neither node appears to recognize the change in the validator set. If I take the original authority offline, the "new" authority doesn't produce any blocks. If I then bring the original authority back online and restart the "new" authority, they fork from one another. |
Hi @HCastano I have noticed that if I set |
Hey @danzipie! You shouldn't need that flag set to true for the validator set to update. I found a bug in the code path where it's not set earlier today, currently working on a fix :) |
Although I should add, the bug I found is only for the case where you have a fixed validator set (e.g set in the chain spec). I need to do a big more digging to find out what's going on when a contract is being used. |
I was out conferencing last week, will definitely make time to work on this this week |
@mathiasfrey Hey, I haven't had any luck replicating this issue while using a contract. It is possible that your contract setup also had a static validator change at some point? Could you also share more about the setup you were using with the contract, like the full chainspec, and the set of contracts you were using? |
@mathiasfrey Hey, I set up another test network with your chainspec and |
@HCastano we just tested the nightly build with a changing validator set and it always worked. We're going to check the change coming from a validator set to a smart contract validator set next week. Thanks! |
Fixed in 2.3.6 and 2.4.1 |
@demimarie-parity thanks for the ping! |
We're trying to set up a PoA blockchain along the https://wiki.parity.io/Demo-PoA-tutorial with the following specifics:
We start up the first node, deploy the provided contracts for validators (reporting) and permissions and scale up to 6 nodes with 1 authority and 5 non-authority nodes.
Then we change non-validators into validators step by step & slowly (30 blocks time in between). The network keeps being in sync, all nodes see each other as peers.
This is how turning the first non-authority node into an authority node looks like. Pay attention to the benign behaviour reporting, probably a race condition as the change of validator set might not be propagated fast enough / takes effect too quickly. In any case, the mining goes on:
Upon adding the fifth (or sixth or anyhow random?) validator node the network breaks apart with three types of different behaviour on the nodes:
Partition 1: sync stalls on 2 nodes
Happens on latest added validator and initial validator (random?). They still have peers, but sync is broken and cannot be reccovered.
Partition 2: 3 nodes keep mining
2 validators and the 1 remaining non-validator throw a dump, lose 1 peer but keep in sync:
Partition 3: 1 node mins their own fork
The text was updated successfully, but these errors were encountered: