-
Notifications
You must be signed in to change notification settings - Fork 949
Description
Here are the problems that I think we need to solve in the current cluster:
- strong consistency (for cluster topology)
cluster topology is concerned with which nodes own which slots and primaryship. The current cluster implementation is not even eventually consistent by design because there are places where node epochs are bumped without consensus (trade-offs). This leads to increased complexity on the client side.
- better manageability (of global config/data)
This particular issue provides the exact context on this pain point
- more resilience (to stressful client workload)
Today, both the cluster bus and the client workload run on the same main thread. So a demanding client workload has the potential to starve the cluster bus and leads to unnecessary failover.
- higher scale
The V1 cluster is a mesh so the cluster gossip traffic is proportional to N^2, where N is the (data) nodes in the cluster. The practical limit of a V1 cluster is ~500 nodes.
Originally posted by @PingXie in #58 (comment)