Skip to content

Conversation

@joaopamaral
Copy link
Contributor

The #32707 has changed the scheduler to capture the exceptions (and not re-raising it) and it is making the scheduler always exit with code 0 (that means exit without error). In case of an airflow-scheduler running in a docker container with the --restart always option set, the exit code 0 will make the container stop and not restart because docker will interpret it as the code ran and finished with success. So if you have a DB quick unavailability, the airflow-scheduler will not try to recover by restarting it and it will stop, causing the airflow instance to crash (without the scheduler).

So, this PR is raising the exception and forcing the process in context to terminate (closing the contexts in stack on finally to keep avoiding the zombie scheduler #32706) and avoid the scheduler exiting with code 0.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk merged commit 1d5d502 into apache:main Jan 15, 2024
@potiuk potiuk added this to the Airflow 2.8.1 milestone Jan 15, 2024
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Jan 16, 2024
ephraimbuddy pushed a commit that referenced this pull request Jan 16, 2024
* Fix airflow-scheduler exiting with code 0 on exceptions

* Fix static check

(cherry picked from commit 1d5d502)
dstandish added a commit to astronomer/airflow that referenced this pull request Mar 21, 2024
In apache#36800 author fixed zombie scheduler issue arising from context manager exit not being called, thus sub process not getting terminated.  It was fixed by explicitly calling the `close` function on an ExitStack-managed context manager.  Simpler / better / cleaner / more standard solution is to "fix" the underlying context managers by wrapping the yield in a try / finally.
dstandish added a commit that referenced this pull request Mar 21, 2024
In #36800 author fixed zombie scheduler issue arising from context manager exit not being called, thus sub process not getting terminated.  It was fixed by explicitly calling the `close` function on an ExitStack-managed context manager.  Simpler / better / cleaner / more standard solution is to "fix" the underlying context managers by wrapping the yield in a try / finally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants