Skip to content

Conversation

@xuganyu96
Copy link
Contributor

This PR attempts to address issue #32706

When run_job raises unhandled exception, the scheduler process will become zombie. Placing run_job inside a try-catch block will allow serve_logs and health_check to properly terminate the sub-processes.

@xuganyu96 xuganyu96 force-pushed the scheduler-fails-to-exit-upon-db-disconnect branch 3 times, most recently from 0944f08 to c5f860f Compare July 20, 2023 17:45
@uranusjr
Copy link
Member

This needs a test case.

@xuganyu96
Copy link
Contributor Author

@uranusjr

Thank you for the feedback. I have added a test case that mocked airflow.jobs.job.run_job to simulate the situation where run_job throws some Exception, then test that the exception is caught and sub_proc.terminate is still called.

@xuganyu96 xuganyu96 force-pushed the scheduler-fails-to-exit-upon-db-disconnect branch 2 times, most recently from b3d9fd9 to b5554b5 Compare July 21, 2023 06:41
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log = logging.getLogger(__name__)
logger = logging.getLogger(__name__)

Maybe this? Usually I see it's called logger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please point to where the logger is named logger? The modules I've touched so far all named logger as log example

Copy link
Member

@pankajkoti pankajkoti Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. The example you shared is using self.log and not log. I believe this comes from extending the LoggingMixin class which has the log property.

What I meant is in general when using logging.getLogger(__name__), python modules call it as logger across the projects I have seen in so far. One example in our project I could find is here

logger = logging.getLogger(__name__)

But I think I have another suggestion here. Since this is a CLI command I guess we could maybe use the AirflowConsole here to print the message? e.g. https://github.com/apache/airflow/blob/main/airflow/cli/commands/connection_command.py#L360

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming variables one way or another seems ultimately trivial, so I am perfectly okay with changing the variable name to logger.

On the other hand, I am not sure about using AirflowConsole for logging purposes. The code changes are located under the cli module, but they are ultimately meant to catch exceptions that came up within the inner logic of the scheduler, so I think it should be treated like a normal logger.

Another situation where I think using a logger is more appropriate is when the scheduler is run as a daemon (with airflow scheduler -D), in which case using the AirflowConsole will not log the exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming on this is unfortunately pretty inconsistent in Airflow. Personally I much prefer logger but 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also log what the exception message was?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/3/library/logging.html#logging.exception
Calling log.exception will automatically include the exception information so there is no need to include it separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry missed reading the logging level.

@xuganyu96 xuganyu96 force-pushed the scheduler-fails-to-exit-upon-db-disconnect branch 2 times, most recently from ebf06f3 to c927469 Compare July 23, 2023 23:55
@xuganyu96 xuganyu96 force-pushed the scheduler-fails-to-exit-upon-db-disconnect branch from ff7b2d5 to 2883da3 Compare July 25, 2023 02:57
@xuganyu96 xuganyu96 force-pushed the scheduler-fails-to-exit-upon-db-disconnect branch from 991df0c to 6897d23 Compare July 25, 2023 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants