Clean up lifecycle ordering on shutdown #2177

ssalinas · 2021-02-10T15:05:33Z

Realized there were a few cases in here where we could possibly still have blips of multiple leaders or no leaders at a time. In particular the following set of events:

Old leader gets a TERM signal
Old leader fails to properly wait for all scheduled pollers to exit
Old leader relinquishes leadership
New leader gains leadership and starts loading leader cache
Old leader pollers finish their run, possibly flush more state to zk
state flushed by old leader is now not in leader cache on new leader

Need to test this a bit, but this adds:

A specific pre-jetty-shutdown hook so that the leader cleanly moves from the leading instance to a read-only/standby before actually fully shutting down (so that we don't have weird blips in serving writes when leader changes). All pollers should stop in this step
poll for leadership change to make sure it has propagated
better/longer wait on running pollers
log when pollers aren't shut down cleanly

rosalind210 · 2021-03-04T17:01:34Z

...larityService/src/main/java/com/hubspot/singularity/SingularityManagedThreadPoolFactory.java


          if (!service.awaitTermination(timeoutLeftInMillis, TimeUnit.MILLISECONDS)) {
-            return;
+            LOG.warn("Eeecutor service tasks did not exit in time");


nit: typo on executor

rosalind210 · 2021-03-04T17:30:49Z

🚢

ssalinas added 2 commits February 10, 2021 09:54

Clean up lifecycle ordering on shutdown

436d50f

better leader check

118e55e

ssalinas added the staging Merged to staging branch label Feb 10, 2021

rosalind210 reviewed Mar 4, 2021

View reviewed changes

typo

8646339

ssalinas merged commit 98512dd into master Mar 4, 2021

ssalinas deleted the leader_cache_race_conditions branch March 4, 2021 18:44

ssalinas added this to the 1.5.0 milestone May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean up lifecycle ordering on shutdown #2177

Clean up lifecycle ordering on shutdown #2177

Uh oh!

ssalinas commented Feb 10, 2021 •

edited

Loading

Uh oh!

rosalind210 Mar 4, 2021

Uh oh!

rosalind210 commented Mar 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Clean up lifecycle ordering on shutdown #2177

Clean up lifecycle ordering on shutdown #2177

Uh oh!

Conversation

ssalinas commented Feb 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rosalind210 Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

rosalind210 commented Mar 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ssalinas commented Feb 10, 2021 •

edited

Loading