Constraints to de-prioritize nodes from becoming shard allocation targets

## The Problem
In clusters where each node has a reasonable number of shards, when a node is replaced, (or a small number of node(s) added); all shards of newly created indexes get allocated on the new empty node(s). It puts a lot of stress on the single (or few) new node, deteriorating overall cluster performance.

This happens because the `index-balance` factor in weight function is unable to offset `shard-balance` when other nodes are filled with shards. It causes the new node to always have minimum weight and get picked as the target for allocation (until the new node has approx. mean number of shards/node in the cluster).

Further, `index-balance` does kick in after the node is relatively filled (as compared to other nodes), and it moves out the recently allocated shards. Thus for the same shards, which are newly created and [usually getting indexing traffic](https://github.com/elastic/elasticsearch/issues/17213#issuecomment-199226605), we end up doing twice the work in allocation.


### Current Workaround
A work around for this is to set the `total-shards-per-node` index/cluster [setting](https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-total-shards.html) , which limits shards of an index on a single node. 

This however, is a hard limit enforced by Allocation Deciders. If breached on all nodes, the shards go unassigned causing yellow/red clusters. Configuring this setting requires careful calculation around number of nodes, and must be revised when the cluster is scaled down.


## Proposal
We propose an *allocation constraint* mechanism, that de-prioritizes nodes from getting picked for allocation if they breach certain constraints. Whenever an allocation constraint is breached for a shard (or index) on a node, we add a high positive constant to the node's weight. This increased weight makes the node less preferable for allocating the shard. Unlike deciders, however, this is not a hard filter. If no other nodes are eligible to accept shards (say due to deciders like disk watermarks), the shard can still be allocated on nodes with breached constraints.

![Constraint Based Weights - Step Function Diagram](https://user-images.githubusercontent.com/869395/59717122-434d3500-91cc-11e9-9b9c-75939cf1727a.jpg)


### Index Shards Per Node Constraint
This constraint controls the number of shards of an index on a single node. `indexShardsPerNodeConstraint` is breached if the number of shards of an index allocated on a node, exceeds average shards per node for that index.

```java
long expIndexShardsOnNode = node.numShards(index) + numAdditionalShards;
long allowedShardsPerNode = Math.round(Math.ceil(balancer.avgShardsPerNode(index)));
boolean shardPerNodeLimitBreached = (expIndexShardsOnNode - allowedShardsPerNode) > 0;
```


`indexShardsPerNodeConstraint` getting breached, causes shards of the newly created index to get assigned to other nodes, thus preventing indexing hot spots. Post unassigned shard allocation, rebalance fills up the node with other index shards, without having to move out the already allocated shards. Since this does not prevent allocation, we do not run into unassigned shards due to breached constraints.


The allocator flow now becomes: 
```
deciders to filter out ineligible nodes -> constraints to de-prioritize certain nodes by increasing their weight -> node selection as allocation target.
```

## Extension
This framework can be extended with other rules by modeling them as boolean constraints. One possiible example is primary shard density on a node, as required by [issue #41543](https://github.com/elastic/elasticsearch/issues/41543). A constraint that requires `#primaries on a node <= avg-primaries on a node`, could prevent scenarios with all primaries on few nodes and all replicas on others.


## Comparing Multiple Constraints
For every constraint breached, we add a high positive constant to the weight function. This is a step function approach where we consider all nodes on the lower step (lower weight) as more eligible than those on a higher step. The high positive constant ensures  that node weight doesn't go as high as the next step unless it breaches a constraint.

Since the constant is added for every constraint breached, i.e. `c * HIGH_CONSTANT`, nodes with one constraints breached are preferred over nodes with two constraints breached and so on. All nodes with the same number of constraints breached simply resort back to the first part of weight function, which is based on shard count.

We could potentially keep different weights for different constraints with some sort of ranking among them. But this can quickly become a maintenance and extension nightmare. Adding new constraints will require going through every other constraint weight and deciding on the right weight and place in priority.


### Next Steps
If this idea makes sense and seems like a useful add to the Elasticsearch community, we can raise a PR for these changes.

This change is targeted to solve problems listed in https://github.com/elastic/elasticsearch/issues/17213, https://github.com/elastic/elasticsearch/issues/37638, https://github.com/elastic/elasticsearch/issues/41543, https://github.com/elastic/elasticsearch/issues/12279, https://github.com/elastic/elasticsearch/issues/29437



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Constraints to de-prioritize nodes from becoming shard allocation targets #43350

The Problem

Current Workaround

Proposal

Index Shards Per Node Constraint

Extension

Comparing Multiple Constraints

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Constraints to de-prioritize nodes from becoming shard allocation targets #43350

Description

The Problem

Current Workaround

Proposal

Index Shards Per Node Constraint

Extension

Comparing Multiple Constraints

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions