-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodes may not agree on leader consensus #885
Comments
Thanks for reporting this - it's one of the things that has annoyed me about the underlying communication stack that room-assistant uses for a while now. I had the plan to re-do the whole cluster stuff based on more standardized protocols (was looking at ZeroMQ for communication, with Zyre for the clustering on top), but last time I checked most libraries still leave a lot of things open. |
I have the same thing happening now where 1 of 3 rpi0's simply refuse to join the cluster and follow the leader as it were. I have tried Weight, quorum etc. nothing works. the software is honestly too buggy to be used outside of a testing/development function and I think that should be posted on the website in HUGE LETTERS. its a bit of a let down. so I reverted back to motion sensors for now and I will keep testing new versions when they come out since I really do appreciate the effort. Truly. |
Wanted to indicate that I am experiencing the same issue. Leadership of the cluster can move around a lot. Assigned weight 100 to one node and the rest way lower numbers. Still for 3 out of the 4 nodes, it selects another leader then the one with weight 100. It seems the weight doesn't actually do anything. Problem is that the 1 node that does select the node with the highest weight is himself. I also appreciate the effort with the software, but at this moment it is really buggy. The BLE app for the iphone is really inconsistent with distance and reporting. It could however be connected with the instability of the leader consensus. |
Just for info I have experienced the same, I got round it by 1) turning off auto discovery and hard coding the nodes and weights in all my room-assistant nodes, 2) running an automation in Home assistant that checks the state of the cluster leader, and in my case if it is not "hall" then restart the room_assistant addon on my HA install which is the cluster leader. I have found maintaining the cluster leader is essential to accurate room presence, also as stated in another issue I have given up with BLE and switched back to BluetoothClassic as it is reliable with both IOS and Android all the time, not as fast or accurate, but reliable. Example automation for cluster reset:
|
Describe the bug
The tests for leader conflicts do not account for out of order message delivery to multiple nodes.
Generally speaking there does not appear to be any guarantee that nodes in the cluster will agree on who the leader is. The guarantee appears to be that each node will believe there is exactly one leader. As far as I can tell there is no enforcement of leadership as well. The client can ask any node for an answer and room assistant will respond, even if it's not the leader. It's also not required that the cluster itself produce a single coherent leader.
To reproduce
add and remove nodes quickly from the cluster. Eventually it will get out of sync. Restoring power after a power outage could create this condition.
Additional context
Paxos or raft or any other distributed consensus algorithm would be required to ensure that nodes agree on who the leader is.
The text was updated successfully, but these errors were encountered: