-
Couldn't load subscription status.
- Fork 38.8k
Description
Applications that need to perform graceful shutdown of tasks submitted to ThreadPoolTaskExecutor and/or ThreadPoolTaskScheduler, may make use of the setting awaitTerminationMillis (and possibly waitForTasksToCompleteOnShutdown).
However, application may take a very long time to actually finish if these conditions apply:
- application uses both
ThreadPoolTaskExecutorandThreadPoolTaskScheduler(and possibly multipleThreadPoolTaskExecutors). - submitted tasks are not quick.
- there are
SmartLifecyclebeans which implements lenghty stopping.
Real examples are web applications that use such components and make use of the web container "graceful shutdown" feature of Spring Boot. The overall termination sequence of an application is:
- SIGTERM is sent.
SmartLifecycleasynchronous stopping triggers the web container graceful shutdown.SmartLifecycleasynchronous stopping blocks and waits for the web container shutdown.- Context closing proceeds, invoking
DisposableBean/@PreDestroymethods, so, say:ThreadPoolTaskExecutor'sdestroy()is called, blocking and awaiting for the tasks to finish. If the application uses multipleThreadPoolTaskExecutors, for each one of them an awaiting occurs.ThreadPoolTaskScheduler'sdestroy()is called, blocking and awaiting for the tasks to finish.
The proposal here is to create the possibility to have some form to make all the pools to finish their tasks in parallel. Ideally, in parallel with the stopping of other SmartLifecycle beans.
In Kubernetes spring web applications that fits in the scenario above, the total time took by a Pod to finish ends up being too high (which is aggravated due to the need to configure a preStop hook to give time to kubeproxy to note the pod deletion). This has real effects: as, rigthly, in a rollout Kubernetes does not wait for old Pods to actually finish in order to create new ones, applications with a large number of pods and with a large termination time, end up having a large number of Pods actually running in the rollout (new Pods and many still being terminated). We have seen this triggering a cluster auto scale to make the cluster able to handle this large number of Pods that occur during the rollout.