ensure system tablets are alive when an entire dc is in a crash-loop

System tablets (specifically, coordinators and mediators) are kept in one datacenter when possible. Assume we are deploying a new problematic version, and at first rolled it out on one datacenter, causing all nodes there to become stuck in a crash-loop. Then, system tablets remain in this DC, are not working, and the entire database is effectively down, despite only one location having problems. This issue is for improving the behavior in this case.

- [x] Run simulations to see whether the "stick system tablets together" functionality is the cause of the problems, or if it would be the same with it turned off.
- [ ] Make a fix based on the simulations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ensure system tablets are alive when an entire dc is in a crash-loop #14323

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ensure system tablets are alive when an entire dc is in a crash-loop #14323

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions