Remove global from celery provider #58869

jscheffl · 2025-11-30T14:16:51Z

Another small increment to remove global statements for PR #58116

This removes global statement in celery, a classsic thing where functools.cache() can help. Also centralization of config retrieval in this case makes code better in general.

global is evil.

providers/celery/src/airflow/providers/celery/executors/celery_executor_utils.py

* Remove global from celery provider * Remove global from celery provider, add caching

michaelosthege · 2026-01-29T16:08:07Z

We're pretty sure that we traced worker processes getting SIGKILLed with the following error to the changes ~~from this PR~~ between 3.13.1 and 3.14.1:

[error] Timed out waiting for UP message from <ForkProcess(ForkPoolWorker-2, started daemon)> [celery.concurrency.asynpool] loc=asynpool.py:631
[2026-01-29 15:56:58,818: ERROR/MainProcess] Process 'ForkPoolWorker-2' pid:18 exited with 'signal 9 (SIGKILL)'

We upgraded Airflow 3.1.1 → 3.1.6 and while doing so went celery 5.5.3 → 5.6.2 and apache-airflow-providers-celery 3.12.4 → 3.15.0.

By downgrading apache-airflow-providers-celery, stepping through previous versions we found that the changes from ~~3.14.0 to 3.14.1 (this PR)~~ 3.13.1 → 3.14.0 broke it.

UPDATE: Looks like we got confused during debugging and it was actually the change 3.13.1 → 3.14.0 from #58591.
Maybe the subprocesses within the worker are given the same name, and that's why the check introduced in #58591 kills them before they send their UP message?

jscheffl · 2026-01-29T20:40:47Z

UPDATE: Looks like we got confused during debugging and it was actually the change 3.13.1 → 3.14.0 from #58591. Maybe the subprocesses within the worker are given the same name, and that's why the check introduced in #58591 kills them before they send their UP message?

@michaelosthege hypothesis ... if not duplicate hostname could it be that the duplicate hostname check sometimes needs longer running into timeout as well? Not sure about a specific error message...

michaelosthege · 2026-01-29T22:10:21Z

@michaelosthege hypothesis ... if not duplicate hostname could it be that the duplicate hostname check sometimes needs longer running into timeout as well? Not sure about a specific error message...

No idea how long that code takes to execute, but we had tried increasing celery.concurrency.asynpool.PROC_ALIVE_TIMEOUT to 60 seconds and all it did was delaying the killing. If the hostname check went SystemExit the child process must have been already been dead at that point.
We never saw a log message by those worker child processes. The check from #58591 also appears to run before logging is configured.

How does the UP message travel to the parent? Via the message broker?

We also observed a "clocks out of sync" warning in the logs, but with the worker and Redis containers running on the same host that doesn't make any sense.

jscheffl requested a review from hussein-awala as a code owner November 30, 2025 14:16

boring-cyborg bot added area:providers provider:celery labels Nov 30, 2025

jscheffl commented Nov 30, 2025

View reviewed changes

providers/celery/src/airflow/providers/celery/executors/celery_executor_utils.py Show resolved Hide resolved

potiuk approved these changes Nov 30, 2025

View reviewed changes

jscheffl force-pushed the bugfix/remove-global-from-celery branch 2 times, most recently from 7d647ea to 9a1f29d Compare December 2, 2025 21:41

jscheffl added 2 commits December 3, 2025 06:48

Remove global from celery provider

78cdb93

Remove global from celery provider, add caching

e326e6d

jscheffl force-pushed the bugfix/remove-global-from-celery branch from 9a1f29d to e326e6d Compare December 3, 2025 05:49

jscheffl merged commit 28e6122 into apache:main Dec 3, 2025
79 checks passed

bbovenzi pushed a commit to astronomer/airflow that referenced this pull request Dec 3, 2025

Remove global from celery provider (apache#58869)

f7c9c0c

* Remove global from celery provider * Remove global from celery provider, add caching

itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025

Remove global from celery provider (apache#58869)

4fc0eba

* Remove global from celery provider * Remove global from celery provider, add caching

potiuk mentioned this pull request Dec 9, 2025

Status of testing Providers that were prepared on December 09, 2025 #59259

Closed

52 tasks

This was referenced Jan 29, 2026

ERROR/MainProcess] Process 'ForkPoolWorker-171' pid:188 exited with 'signal 9 (SIGKILL)' celery/celery#9781

Open

Child worker process should send WORKER_UP message within 2 seconds celery/celery#1715

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove global from celery provider #58869

Remove global from celery provider #58869

Uh oh!

jscheffl commented Nov 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

michaelosthege commented Jan 29, 2026 •

edited

Loading

Uh oh!

jscheffl commented Jan 29, 2026

Uh oh!

michaelosthege commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove global from celery provider #58869

Remove global from celery provider #58869

Uh oh!

Conversation

jscheffl commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelosthege commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jscheffl commented Jan 29, 2026

Uh oh!

michaelosthege commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jscheffl commented Nov 30, 2025 •

edited

Loading

michaelosthege commented Jan 29, 2026 •

edited

Loading