Configure Celery for graceful shutdown during K8s deployments #5520

rtibbles · 2025-10-30T00:12:25Z

Summary

Add worker_soft_shutdown_timeout (28s) to Celery config in settings.py
Add REMAP_SIGTERM=SIGQUIT to K8s shared env vars to trigger soft shutdown
Add CELERY_WORKER_SOFT_SHUTDOWN_TIMEOUT=28 env var for configurability
Add same env vars to docker-compose for dev environment consistency
Set terminationGracePeriodSeconds: 30 explicitly in worker deployment

When K8s sends SIGTERM during pod termination, workers will now:

Stop accepting new tasks
Continue processing current task for up to 28 seconds
Exit cleanly if task completes, or timeout after 28s
Allow K8s 2s buffer before 30s grace period expires

❗ 🤖 Generated with Claude Code ❗

References

Fixes #5000

Reviewer guidance

Does this match the spec outlined by @bjester in the issue above? I think it does - noting that we have already upgraded to a compatible version of celery https://github.com/learningequality/studio/blob/hotfixes/requirements.txt#L31

Lastly - for manual testing - what happens when a long running publish is interrupted by this? Does it get properly rerun? I think it should because the "change event" should still be marked as unapplied, and not errored, but would be good to check. It will be a bit of a bummer that it will start from the beginning again, but it's better than not happening at all!

Implements soft shutdown feature from Celery 5.5.3 to prevent task interruption during pod termination. Resolves #5000. Changes: - Add worker_soft_shutdown_timeout (28s) to Celery config in settings.py - Add REMAP_SIGTERM=SIGQUIT to K8s shared env vars to trigger soft shutdown - Add CELERY_WORKER_SOFT_SHUTDOWN_TIMEOUT=28 env var for configurability - Add same env vars to docker-compose for dev environment consistency - Set terminationGracePeriodSeconds: 30 explicitly in worker deployment When K8s sends SIGTERM during pod termination, workers will now: 1. Stop accepting new tasks 2. Continue processing current task for up to 28 seconds 3. Exit cleanly if task completes, or timeout after 28s 4. Allow K8s 2s buffer before 30s grace period expires 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

bjester · 2025-10-30T14:29:40Z

docker-compose.yml

    CELERY_RESULT_BACKEND_ENDPOINT: redis
    CELERY_REDIS_PASSWORD: ""
+    REMAP_SIGTERM: "SIGQUIT"
+    CELERY_WORKER_SOFT_SHUTDOWN_TIMEOUT: "28"


Since we're already defaulting to 28, this adds no value.

bjester

The value of CELERY_WORKER_SOFT_SHUTDOWN_TIMEOUT rightfully needs coordination with the k8s graceful shutdown, like was done with the unused infra code here, but without a coordinating change with docker's timeout, setting it to the already default value in docker-compose.yml feels unnecessary. Google says Docker's default timeout should be 10 seconds. I suggest either we remove it from the docker compose file, or coordinate it with docker's timeout.

Lastly, as mentioned on Slack, we don't really use the infra code that was changed here, so we need to work with infra to implement those changes in the right place. The changes here could stay, or be removed 🤷

claude and others added 2 commits October 30, 2025 00:07

[pre-commit.ci lite] apply automatic fixes

4b51ed8

bjester reviewed Oct 30, 2025

View reviewed changes

rtibbles assigned bjester Nov 4, 2025

bjester requested changes Nov 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configure Celery for graceful shutdown during K8s deployments #5520

Configure Celery for graceful shutdown during K8s deployments #5520

Uh oh!

rtibbles commented Oct 30, 2025 •

edited

Loading

Uh oh!

bjester Oct 30, 2025

Uh oh!

bjester left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Configure Celery for graceful shutdown during K8s deployments #5520

Are you sure you want to change the base?

Configure Celery for graceful shutdown during K8s deployments #5520

Uh oh!

Conversation

rtibbles commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

References

Reviewer guidance

Uh oh!

bjester Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

bjester left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rtibbles commented Oct 30, 2025 •

edited

Loading