Closed
Description
System tablets (specifically, coordinators and mediators) are kept in one datacenter when possible. Assume we are deploying a new problematic version, and at first rolled it out on one datacenter, causing all nodes there to become stuck in a crash-loop. Then, system tablets remain in this DC, are not working, and the entire database is effectively down, despite only one location having problems. This issue is for improving the behavior in this case.
- Run simulations to see whether the "stick system tablets together" functionality is the cause of the problems, or if it would be the same with it turned off.
- Make a fix based on the simulations
Metadata
Metadata
Assignees
Labels
No labels