Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion airflow/cli/commands/scheduler_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"""Scheduler command."""
from __future__ import annotations

import logging
import signal
from contextlib import contextmanager
from multiprocessing import Process
Expand All @@ -35,12 +36,17 @@
from airflow.utils.providers_configuration_loader import providers_configuration_loaded
from airflow.utils.scheduler_health import serve_health_check

log = logging.getLogger(__name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log = logging.getLogger(__name__)
logger = logging.getLogger(__name__)

Maybe this? Usually I see it's called logger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please point to where the logger is named logger? The modules I've touched so far all named logger as log example

Copy link
Member

@pankajkoti pankajkoti Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. The example you shared is using self.log and not log. I believe this comes from extending the LoggingMixin class which has the log property.

What I meant is in general when using logging.getLogger(__name__), python modules call it as logger across the projects I have seen in so far. One example in our project I could find is here

logger = logging.getLogger(__name__)

But I think I have another suggestion here. Since this is a CLI command I guess we could maybe use the AirflowConsole here to print the message? e.g. https://github.com/apache/airflow/blob/main/airflow/cli/commands/connection_command.py#L360

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming variables one way or another seems ultimately trivial, so I am perfectly okay with changing the variable name to logger.

On the other hand, I am not sure about using AirflowConsole for logging purposes. The code changes are located under the cli module, but they are ultimately meant to catch exceptions that came up within the inner logic of the scheduler, so I think it should be treated like a normal logger.

Another situation where I think using a logger is more appropriate is when the scheduler is run as a daemon (with airflow scheduler -D), in which case using the AirflowConsole will not log the exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming on this is unfortunately pretty inconsistent in Airflow. Personally I much prefer logger but 🤷



def _run_scheduler_job(job_runner: SchedulerJobRunner, *, skip_serve_logs: bool) -> None:
InternalApiConfig.force_database_direct_access()
enable_health_check = conf.getboolean("scheduler", "ENABLE_HEALTH_CHECK")
with _serve_logs(skip_serve_logs), _serve_health_check(enable_health_check):
run_job(job=job_runner.job, execute_callable=job_runner._execute)
try:
run_job(job=job_runner.job, execute_callable=job_runner._execute)
except Exception:
log.exception("Exception when running scheduler job")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also log what the exception message was?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/3/library/logging.html#logging.exception
Calling log.exception will automatically include the exception information so there is no need to include it separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry missed reading the logging level.



@cli_utils.action_cli
Expand Down
24 changes: 24 additions & 0 deletions tests/cli/commands/test_scheduler_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,30 @@ def test_disable_scheduler_health(
with pytest.raises(AssertionError):
mock_process.assert_has_calls([mock.call(target=serve_health_check)])

@mock.patch("airflow.cli.commands.scheduler_command.SchedulerJobRunner")
@mock.patch("airflow.cli.commands.scheduler_command.Process")
@mock.patch("airflow.cli.commands.scheduler_command.run_job", side_effect=Exception("run_job failed"))
@mock.patch("airflow.cli.commands.scheduler_command.log")
def test_run_job_exception_handling(
self,
mock_log,
mock_run_job,
mock_process,
mock_scheduler_job,
):
args = self.parser.parse_args(["scheduler"])
scheduler_command.scheduler(args)

# Make sure that run_job is called, that the exception has been logged, and that the serve_logs
# sub-process has been terminated
mock_run_job.assert_called_once_with(
job=mock_scheduler_job().job,
execute_callable=mock_scheduler_job()._execute,
)
mock_log.exception.assert_called_once_with("Exception when running scheduler job")
mock_process.assert_called_once_with(target=serve_logs)
mock_process().terminate.assert_called_once_with()


# Creating MockServer subclass of the HealthServer handler so that we can test the do_GET logic
class MockServer(HealthServer):
Expand Down