"too many failed ingesters" using memberlist

**Describe the bug**
Cluster is down while it should not.

**To Reproduce**
Using Loki 2.1.0
The initial setup is 2 monolithic Loki 2.1.0 running with `replication_factor: 2`.
I add 2 nodes to the cluster, they all show `ACTIVE` looking at `/ring`.
I remove the first 2 nodes. They first show as `LEAVING` then they go `Unhealthy`.
They never leave this state (could not find a relevant config option).
At this point the cluster is down. Read or writes fail with something like:
`level=warn ts=2021-02-19T14:42:46.766880514Z caller=logging.go:71 traceID=44198a5667db211f msg="POST /loki/api/v1/push (500) 147.959µs Response: \"at least 3 live replicas required, could only find 2\\n\"`

Forgetting a single `Unhealthy` node using `/ring` buttons is enough to recover.

**Expected behavior**
2 `ACTIVE` nodes is sufficient for the cluster to be healthy, so the cluster should not be down when this condition is met.
Unhealthy nodes should leave the ring at some configurable point.

**Environment:**
 - Infrastructure: ECS
 - Deployment tool: Terraform

**Screenshots, Promtail config, or terminal output**
![image](https://user-images.githubusercontent.com/35925003/108521177-c0974a00-72cb-11eb-929b-5af0078ec7fa.png)
![image](https://user-images.githubusercontent.com/35925003/108521567-284d9500-72cc-11eb-80c5-c84213d5d24b.png)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"too many failed ingesters" using memberlist #3360

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"too many failed ingesters" using memberlist #3360

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions