Closed
Description
Allocating shards to a node can fail for various reasons. When an allocation fails, we currently ignore the node for that shard during the next allocation round. However, this means that:
- subsequent rounds consider the node for allocating the shard again.
- other shards are still allocated to the node (in particular the balancer tries to put shards on that node with the failed shard as its weight becomes smaller).
This is particularly bad if the node is permanently broken, leading to a never-ending series of failed allocations. Ultimately this affects the stability of the cluster.