Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

PoA network partition upon change of contract-based ValidatorSet #10306

Closed
mathiasfrey opened this issue Feb 6, 2019 · 18 comments
Closed

PoA network partition upon change of contract-based ValidatorSet #10306

mathiasfrey opened this issue Feb 6, 2019 · 18 comments
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. P5-sometimesoon 🌲 Issue is worth doing soon.
Milestone

Comments

@mathiasfrey
Copy link

  • Parity Ethereum version: 2.3.2 / 2.2.9
  • Operating system: Linux
  • Installation: Dockerized: parity/parity:{{version}}
  • Fully synchronized: yes
  • Network: custom PoA
  • Restarted: yes

We're trying to set up a PoA blockchain along the https://wiki.parity.io/Demo-PoA-tutorial with the following specifics:

// chain.json
"stepDuration": "5"
"transactionPermissionContractTransition": 60,
"transactionPermissionContract": <ADDR>,
// authority.toml
usd_per_tx = "0"
tx_queue_size = 32768
tx_queue_mem_limit = 8096
min_gas_price = 0
gas_floor_target = "20000000"
force_sealing = true

We start up the first node, deploy the provided contracts for validators (reporting) and permissions and scale up to 6 nodes with 1 authority and 5 non-authority nodes.

Then we change non-validators into validators step by step & slowly (30 blocks time in between). The network keeps being in sync, all nodes see each other as peers.

This is how turning the first non-authority node into an authority node looks like. Pay attention to the benign behaviour reporting, probably a race condition as the change of validator set might not be propagated fast enough / takes effect too quickly. In any case, the mining goes on:

2019-02-06 17:34:56 UTC Imported #1997 0x229d…9780 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:01 UTC Imported #1998 0x6518…4caa (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:06 UTC Imported #1999 0xc04d…9c5c (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:06 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 17:35:17 UTC    5/25 peers    724 KiB chain  396 KiB db  0 bytes queue  243 KiB sync  RPC:  0 conn,    4 req/s,  170 µs
2019-02-06 17:35:19 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 17:35:19 UTC Signal for transition within contract. New list: [0xf33cf66829fc4ccb0934f772c66f86115a76b8c2, 0xee1fd1869d3715bcb164966dcac9125d9fa499dd]
2019-02-06 17:35:19 UTC Applying validator set change signalled at block 2000
2019-02-06 17:35:19 UTC Imported #2000 0x7109…7375 (1 txs, 0.10 Mgas, 2994 ms, 0.71 KiB)
2019-02-06 17:35:21 UTC Transaction mined (hash 0xef8eb22e5b7213d2f4b6492adf1b5a0c2296ba44952e2a22896e3c8a9b746fde)
2019-02-06 17:35:21 UTC Imported #2001 0x8336…ca00 (5 txs, 0.14 Mgas, 8 ms, 1.41 KiB)
2019-02-06 17:35:26 UTC Imported #2002 0xbd3a…6490 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:31 UTC Imported #2003 0xfee6…813d (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:36 UTC Imported #2004 0x3204…0ef5 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)

Upon adding the fifth (or sixth or anyhow random?) validator node the network breaks apart with three types of different behaviour on the nodes:

Partition 1: sync stalls on 2 nodes

Happens on latest added validator and initial validator (random?). They still have peers, but sync is broken and cannot be reccovered.

2019-02-06 17:54:21 UTC Signal for transition within contract. New list: [0xf33cf66829fc4ccb0934f772c66f86115a76b8c2, 0xee1fd1869d3715bcb164966dcac9125d9fa499dd, 0xff9ed5590167bbca40b52064679b7dd18f4c2cf9, 0x40f0706e72a18d974dc256bb66c4a75c6e6f4cf7, 0xdd996d1d72a237d349b9901e03296ba104447379]
2019-02-06 17:54:21 UTC Imported #2229 0x4343…74f0 (1 txs, 0.10 Mgas, 5 ms, 0.71 KiB)
2019-02-06 17:54:26 UTC Imported #2230 0x9314…29ec (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:54:31 UTC Applying validator set change signalled at block 2229
2019-02-06 17:54:31 UTC Imported #2231 0x0655…2c92 (0 txs, 0.00 Mgas, 1 ms, 0.57 KiB)
2019-02-06 17:54:32 UTC    5/25 peers    856 KiB chain  470 KiB db  0 bytes queue  243 KiB sync  RPC:  0 conn,    2 req/s,  184 µs
2019-02-06 17:54:36 UTC Imported #2232 0x9937…95e8 (0 txs, 0.00 Mgas, 1 ms, 0.57 KiB)
2019-02-06 17:54:41 UTC Imported #2233 0x81ce…fc71 (1 txs, 0.03 Mgas, 1 ms, 0.74 KiB)
2019-02-06 17:54:45 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
vchain-testnet-health013 | "35.221.126.209:30303"
vchain-testnet-health013 | "35.203.79.193:30303"
vchain-testnet-health013 | "35.204.33.208:30303"
vchain-testnet-health013 | "35.204.33.208:30304"
vchain-testnet-health013 | "35.203.79.193:34996"
2019-02-06 17:54:54 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 17:54:54 UTC Imported #2234 0xa6ea…aeb1 (0 txs, 0.00 Mgas, 2180 ms, 0.57 KiB)
2019-02-06 17:55:02 UTC    5/25 peers    857 KiB chain  473 KiB db  0 bytes queue  243 KiB sync  RPC:  0 conn,    0 req/s,  184 µs
2019-02-06 17:55:15 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 17:55:22 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 17:55:23 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 17:55:24 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 17:55:26 UTC Reported benign validator misbehaviour 0x40f0…4cf7
2019-02-06 17:55:32 UTC    5/25 peers    857 KiB chain  473 KiB db   12 KiB queue  243 KiB sync  RPC:  0 conn,    0 req/s,  189 µs
2019-02-06 17:55:37 UTC Syncing    #2234 0xa6ea…aeb1     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      5+    0 Qed     #2237    5/25 peers    857 KiB chain  473 KiB db   18 KiB queue  243 KiB sync  RPC:  0 conn,    0 req/s,  189 µs
2019-02-06 17:55:42 UTC Syncing    #2234 0xa6ea…aeb1     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      5+    0 Qed     #2237    5/25 peers    857 KiB chain  473 KiB db   18 KiB queue  243 KiB sync  RPC:  0 conn,    2 req/s,  189 µs
2019-02-06 17:55:45 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 17:55:47 UTC Syncing    #2234 0xa6ea…aeb1     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      6+    0 Qed     #2238    5/25 peers    857 KiB chain  473 KiB db   20 KiB queue  243 KiB sync  RPC:  0 conn,    5 req/s,  189 µs
2019-02-06 17:55:52 UTC Syncing    #2234 0xa6ea…aeb1     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      6+    0 Qed     #2238    5/25 peers    857 KiB chain  473 KiB db   20 KiB queue  243 KiB sync  RPC:  0 conn,    2 req/s,  189 µs

Partition 2: 3 nodes keep mining

2 validators and the 1 remaining non-validator throw a dump, lose 1 peer but keep in sync:

2019-02-06 17:54:21 UTC Signal for transition within contract. New list: [0xf33cf66829fc4ccb0934f772c66f86115a76b8c2, 0xee1fd1869d3715bcb164966dcac9125d9fa499dd, 0xff9ed5590167bbca40b52064679b7dd18f4c2cf9, 0x40f0706e72a18d974dc256bb66c4a75c6e6f4cf7, 0xdd996d1d72a237d349b9901e03296ba104447379]
2019-02-06 17:54:21 UTC Imported #2229 0x4343…74f0 (1 txs, 0.10 Mgas, 2 ms, 0.71 KiB) + another 1 block(s) containing 0 tx(s)
2019-02-06 17:54:23 UTC    5/25 peers    865 KiB chain  472 KiB db  0 bytes queue  243 KiB sync  RPC:  0 conn,    0 req/s,  182 µs
2019-02-06 17:54:26 UTC Imported #2230 0x9314…29ec (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:54:31 UTC Applying validator set change signalled at block 2229
2019-02-06 17:54:31 UTC Imported #2231 0x0655…2c92 (0 txs, 0.00 Mgas, 4 ms, 0.57 KiB)
2019-02-06 17:54:36 UTC Imported #2232 0x9937…95e8 (0 txs, 0.00 Mgas, 1 ms, 0.57 KiB)
2019-02-06 17:54:41 UTC Imported #2233 0x81ce…fc71 (1 txs, 0.03 Mgas, 1 ms, 0.74 KiB)
2019-02-06 17:54:49 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 17:54:52 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 17:54:52 UTC Imported #2234 0xa6ea…aeb1 (0 txs, 0.00 Mgas, 9 ms, 0.57 KiB)
2019-02-06 17:54:53 UTC    5/25 peers    867 KiB chain  475 KiB db  0 bytes queue  243 KiB sync  RPC:  0 conn,    0 req/s,  174 µs
2019-02-06 17:54:58 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 17:55:01 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 17:55:03 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 17:55:05 UTC Validator 0xff9e…2cf9 could not be reported Transaction import error: Transaction error (Already imported)
2019-02-06 17:55:05 UTC Stage 4 block verification failed for #2232 (0xca80…b9ac)
Error: Error(Engine(NotProposer(Mismatch { expected: 0x40f0706e72a18d974dc256bb66c4a75c6e6f4cf7, found: 0xff9ed5590167bbca40b52064679b7dd18f4c2cf9 })), State { next_error: None, backtrace: InternalBacktrace { backtrace: Some(stack backtrace:
   0:     0x55e0353281ad - <no info>
   1:     0x55e035326b62 - <no info>
   2:     0x55e035076381 - <no info>
   3:     0x55e0350766b5 - <no info>
   4:     0x55e034a2868f - <no info>
   5:     0x55e034a32577 - <no info>
   6:     0x55e0347c5fcd - <no info>
   7:     0x55e03483434a - <no info>
   8:     0x55e03431ba8e - <no info>
   9:     0x55e034904ab1 - <no info>
  10:     0x55e0348b02b8 - <no info>
  11:     0x55e0348b25ca - <no info>
  12:     0x55e03478ae6a - <no info>
  13:     0x55e034a10a0d - <no info>
  14:     0x55e0353b04e9 - <no info>
  15:     0x55e0349c07be - <no info>
  16:     0x55e03539a79d - <no info>
  17:     0x55e035385355 - <no info>
  18:     0x7f9dab39e6b9 - <no info>
  19:     0x7f9daaebe41c - <no info>
  20:                0x0 - <no info>) } })
...
2019-02-06 18:37:22 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 18:37:22 UTC Imported #2438 0x3ca9…930a (0 txs, 0.00 Mgas, 2 ms, 0.57 KiB)
2019-02-06 18:37:38 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 18:37:39 UTC Reported benign validator misbehaviour 0xdd99…7379
2019-02-06 18:37:41 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:37:41 UTC Transaction culled (hash 0x6e6929706fcd7e329005224ab8ae4dd908c1878b8a24d74153a86627912bfd88)
2019-02-06 18:37:41 UTC Imported #2439 0x607b…b6f5 (3 txs, 0.09 Mgas, 2779 ms, 1.08 KiB)
2019-02-06 18:37:47 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 18:37:47 UTC Imported #2440 0x2a59…1ea3 (0 txs, 0.00 Mgas, 2 ms, 0.57 KiB)
2019-02-06 18:37:49 UTC    4/25 peers    973 KiB chain  685 KiB db  0 bytes queue  243 KiB sync  RPC:  0 conn,    4 req/s,  185 µs
2019-02-06 18:38:04 UTC Reported benign validator misbehaviour 0xdd99…7379
2019-02-06 18:38:05 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:38:05 UTC Transaction culled (hash 0x75558e5b9f8e1ae9173a4dcfd0f402aa52a30a305f002855fea0cb49d0d7992a)
2019-02-06 18:38:05 UTC Imported #2441 0x491e…75ea (3 txs, 0.09 Mgas, 2207 ms, 1.08 KiB)
2019-02-06 18:38:08 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 18:38:12 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 18:38:12 UTC Imported #2442 0xba4b…30ee (0 txs, 0.00 Mgas, 2 ms, 0.57 KiB)
2019-02-06 18:38:19 UTC    4/25 peers    974 KiB chain  861 KiB db  0 bytes queue  243 KiB sync  RPC:  0 conn,    4 req/s,  185 µs

Partition 3: 1 node mins their own fork

2019-02-06 18:39:24 UTC Syncing    #2365 0x3ee8…a11e     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2232    3/25 peers      1 MiB chain  624 KiB db  0 bytes queue   66 KiB sync  RPC:  0 conn,    0 req/s,  175 µs
2019-02-06 18:39:28 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 18:39:32 UTC Reported benign validator misbehaviour 0x40f0…4cf7
2019-02-06 18:39:33 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:39:33 UTC Syncing    #2365 0x3ee8…a11e     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2232    3/25 peers      1 MiB chain  624 KiB db  0 bytes queue   66 KiB sync  RPC:  0 conn,    7 req/s,  175 µs
2019-02-06 18:39:34 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 18:39:34 UTC Transaction culled (hash 0xd81e114ef144b57a8e51bb84bfd12efd12fb00f05d96793098dbfa7da62aecd8)
2019-02-06 18:39:34 UTC Imported #2366 0x7a26…994b (1 txs, 0.03 Mgas, 13 ms, 0.74 KiB)
2019-02-06 18:39:43 UTC Syncing    #2366 0x7a26…994b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2232    3/25 peers      1 MiB chain  625 KiB db  0 bytes queue   66 KiB sync  RPC:  0 conn,    0 req/s,  175 µs
2019-02-06 18:39:52 UTC Reported benign validator misbehaviour 0x40f0…4cf7
2019-02-06 18:39:53 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:39:53 UTC Syncing    #2366 0x7a26…994b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2232    3/25 peers      1 MiB chain  625 KiB db  0 bytes queue   66 KiB sync  RPC:  0 conn,    7 req/s,  176 µs
2019-02-06 18:39:54 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 18:39:54 UTC Imported #2367 0xae37…ef18 (0 txs, 0.00 Mgas, 2 ms, 0.57 KiB)
2019-02-06 18:39:58 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 18:40:03 UTC Syncing    #2367 0xae37…ef18     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2232    3/25 peers      1 MiB chain  625 KiB db  0 bytes queue   66 KiB sync  RPC:  0 conn,    0 req/s,  176 µs
2019-02-06 18:40:12 UTC Reported benign validator misbehaviour 0x40f0…4cf7
2019-02-06 18:40:13 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:40:13 UTC Syncing    #2367 0xae37…ef18     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2232    3/25 peers      1 MiB chain  625 KiB db  0 bytes queue   66 KiB sync  RPC:  0 conn,    7 req/s,  176 µs
2019-02-06 18:40:14 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 18:40:14 UTC Transaction culled (hash 0x460c8a8c58c7c11186d8906e8ae2bbdbc5ac93cd5005737e97de8b810c04772e)
2019-02-06 18:40:14 UTC Imported #2368 0xde8f…7880 (1 txs, 0.03 Mgas, 2 ms, 0.74 KiB)
@jam10o-new jam10o-new added F2-bug 🐞 The client fails to follow expected behavior. P5-sometimesoon 🌲 Issue is worth doing soon. M4-core ⛓ Core client code / Rust. labels Feb 6, 2019
@jam10o-new jam10o-new added this to the 2.4 milestone Feb 6, 2019
@jam10o-new
Copy link
Contributor

(cc @HCastano - lots of stuff to dig through)

@mathiasfrey
Copy link
Author

I can reproduce the issue without a validator contract in place.

When switching the configuration at a future block

            "validators" : {
                "multi": {
                        "0": { "list": ["0x634Aa3713fc7A61061B1699E7bCcC819f2168B5f",
                                        "0xd03599FA9a60Eb59Ce59D510Cd2e90b7c2B47487",
                                        "0xbEfa81Fa6832d8aCb3890113375423fe2f921BEE"] },
                        "101100": { "list": ["0x634Aa3713fc7A61061B1699E7bCcC819f2168B5f",
                                        "0xd03599FA9a60Eb59Ce59D510Cd2e90b7c2B47487",
                                        "0xbEfa81Fa6832d8aCb3890113375423fe2f921BEE",
                                        "0x6737f7bbc98dc5558b425a238c00ef377e1bde45"]},
                        "101850": { "list": ["0xd03599FA9a60Eb59Ce59D510Cd2e90b7c2B47487",
                                        "0x6737f7bbc98dc5558b425a238c00ef377e1bde45",
                                        "0x93A75e6601E981628FF089f8eE86dEC3c5d247A1",
                                        "0xB1D772314abc1ef770AAb7ce4550C1B508Bc931A"]                                       
                                      }                                        
                }
            }

I get the same errors and the chain breaks into several partitions.

2019-02-13 14:02:39 UTC Stage 4 block verification failed for #101108 (0x1447…c7de)
node001 | Error: Error(Engine(NotProposer(Mismatch { expected: 0xbefa81fa6832d8acb3890113375423fe2f921bee, found: 0xd03599fa9a60eb59ce59d510cd2e90b7c2b47487 })), State { next_error: None, backtrace: InternalBacktrace { backtrace: Some(stack backtrace:
...

Had the issue both at block #101100 and #10185; strangely it was always the first validator node in the array that reported a mismatch.

cc @joshua-mir

@HCastano
Copy link
Contributor

@mathiasfrey Would you be able to share the code you're using to set up your network?

@mathiasfrey
Copy link
Author

@HCastano
Copy link
Contributor

@mathiasfrey How come there's only node in your docker-compose.yml? I would've expected all six to be there

@mathiasfrey
Copy link
Author

@HCastano , my nodes are spread across the globe. I checked out my code on different machines with public IPs as this represents the target setup.

In this specific setup I had 3 machines with two instances each distinguished by port (e.g. you can see the same IP but two different ports in the bootnodes):

  • Amsterdam :30303
  • Amsterdam :30304
  • Montreal :30303
  • Montreal :30304
  • Tokyo :30303
  • Tokyo :30304

@5chdn 5chdn modified the milestones: 2.4, 2.5 Feb 21, 2019
@HCastano
Copy link
Contributor

I've managed to replicate this locally with two nodes running v2.3.2. In the chain spec I set up one node to be an authority from block 0, and the other to get added to the validator set at a future block. When that block was hit (if you were doing this with a contract you'd need to wait for the block to be finalized instead of simply reached) neither node appears to recognize the change in the validator set.

If I take the original authority offline, the "new" authority doesn't produce any blocks. If I then bring the original authority back online and restart the "new" authority, they fork from one another.

@danzipie
Copy link

danzipie commented Mar 1, 2019

Hi @HCastano I have noticed that if I set
"immediateTransitions": true
when the block is hit, my validator recognizes the change.

@HCastano
Copy link
Contributor

HCastano commented Mar 1, 2019

Hey @danzipie! You shouldn't need that flag set to true for the validator set to update. I found a bug in the code path where it's not set earlier today, currently working on a fix :)

@HCastano
Copy link
Contributor

HCastano commented Mar 1, 2019

Although I should add, the bug I found is only for the case where you have a fixed validator set (e.g set in the chain spec). I need to do a big more digging to find out what's going on when a contract is being used.

@HCastano
Copy link
Contributor

I was out conferencing last week, will definitely make time to work on this this week

@HCastano
Copy link
Contributor

@mathiasfrey Hey, I haven't had any luck replicating this issue while using a contract. It is possible that your contract setup also had a static validator change at some point?

Could you also share more about the setup you were using with the contract, like the full chainspec, and the set of contracts you were using?

@mathiasfrey
Copy link
Author

@HCastano you're correct. We always started with a static list and then switched over to a contract. One of the example chain specs attached.
chain.txt

@HCastano
Copy link
Contributor

@mathiasfrey Hey, I set up another test network with your chainspec and v2.4.3 of the client and I'm not seeing any issues with the contract based validator set. Have you seen this issue since I put in that fix a few weeks ago?

@zerotrustconsortium
Copy link

@HCastano we just tested the nightly build with a changing validator set and it always worked. We're going to check the change coming from a validator set to a smart contract validator set next week. Thanks!

@Demi-Marie
Copy link
Contributor

Demi-Marie commented Mar 26, 2019

Pinging @varasev @afck in case this could have explained some of their problems at POA Networks.

@soc1c
Copy link
Contributor

soc1c commented Mar 26, 2019

Fixed in 2.3.6 and 2.4.1

@varasev
Copy link
Contributor

varasev commented Mar 26, 2019

@demimarie-parity thanks for the ping!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. P5-sometimesoon 🌲 Issue is worth doing soon.
Projects
None yet
Development

No branches or pull requests

9 participants