Open
Description
Try to find out why h's workers didn't reconnect automatically and fix that if we can. The system is supposed to be self-recovering so that even with a helpless engineer on call all should be well. It actually didn't self-recover: it took engineers to spot that workers were not re-connecting and re-deploy h. (Although to be fair we had multiple alarms still telling us that workers were still not working, so it's not like we had to spot it in the logs ourselves.)