-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler uses [scheduler]job_heartbeat_sec property for heartbeats instead of [scheduler]scheduler_heartbeat_sec #37971
Comments
Are you sure @kosteev ? From what I see it is used n Job's def perform_heartbeat(
job: Job, heartbeat_callback: Callable[[Session], None], only_if_necessary: bool
) -> None:
"""
Perform heartbeat for the Job passed to it,optionally checking if it is necessary.
:param job: job to perform heartbeat for
:param heartbeat_callback: callback to run by the heartbeat
:param only_if_necessary: only heartbeat if it is necessary (i.e. if there are things to run for
triggerer for example)
"""
seconds_remaining: float = 0.0
if job.latest_heartbeat and job.heartrate: # <--- HERE THE HEARTRATE IS USED FROM JOB (and one line down)
seconds_remaining = job.heartrate - (timezone.utcnow() - job.latest_heartbeat).total_seconds()
if seconds_remaining > 0 and only_if_necessary:
return
job.heartbeat(heartbeat_callback=heartbeat_callback) I believe there were some fixes after 2.6.3 that fixed bad threshold calculations for some jobs (It was confusing whether the job used 2.1* multiplier or whether you had fixed value from the configuration) but I'd expect heartrate was not affected - maybe threshold is (or rather was) calculated wrongly. Also this scenario:
is expected. If heartbeat sec is greater than threshold, then yes you'd expect to see the warning (the warning is issued when the heartbeat does not arrive in the threshold time - which would be exactly what is happening if those values are set as described). Mybe the reproduction scenario is wrong? Could you please clarify ? |
In perform_heartbeat, job.heartrate is used. airflow/airflow/jobs/scheduler_job_runner.py Line 153 in 08efc32
so, now this heartrate: int = conf.getint("scheduler", "SCHEDULER_HEARTBEAT_SEC") from SchedulerJobRunner is not used anywhere |
About reproduction scenario: |
Indeed.
Ah 🤦 -> indeed missed that it is about the other config. Would you like to fix it maybe @kosteev ? That seems like a good contribution and simple fix to make ? |
I will try to prepare a fix, but I do not want to commit to this, so that if anyone wants to fix it. |
Sinc apache#34026 scheduler heartrate has not been properly calculated. We missed the check for SchedulerJob type and setting heartrate value from `scheduler_health_check_threshold`. This PR fixes it. Fix: apache#37971
Actually not even there UPDATE: Nope - it was there in #34026 - it just missed the SchedulerJob if with |
The 2.6.3 version is:
So almost for sure it wouldn't have caused the problem. (BTW. I am trying to see if we can add the fix to 2.8.3 that we vote on currently - because otherwise it would affect the whole 2.8 line) |
Ahh.. those parameters are confusing and I am mixing them ... it is indeed way older sorry (threshold mixed with secs) |
Sinc apache#30255 scheduler heartrate has not been properly calculated. We missed the check for SchedulerJob type and setting heartrate value from `scheduler_health_check_threshold`. This PR fixes it. Fix: apache#37971
Tracked it down (there were quite a few refactors) and indeed it was #30255 that introduced it |
So it's there since 2.6.0 |
(just wanted to make sure how far back it goes) |
I prepared a fix like this #37993 But I see you have already fix (and implemented differently). Should I close my PR? |
Sinc apache#30255 scheduler heartrate has not been properly calculated. We missed the check for SchedulerJob type and setting heartrate value from `scheduler_health_check_threshold`. This PR fixes it. Fix: apache#37971
Yeah - I think the way where it is closer to heartrate property is a bit better and easier to reason about. |
We have a bit of a mess there - because the JOB (scheduler/job_heartrabeat_sec) has been used and abused by different jobs (not only scheduler one), so likely better to keep it in JOB. |
…7992) Sinc apache#30255 scheduler heartrate has not been properly calculated. We missed the check for SchedulerJob type and setting heartrate value from `scheduler_health_check_threshold`. This PR fixes it. Fix: apache#37971
…7992) Sinc apache#30255 scheduler heartrate has not been properly calculated. We missed the check for SchedulerJob type and setting heartrate value from `scheduler_health_check_threshold`. This PR fixes it. Fix: apache#37971
…7992) Sinc apache#30255 scheduler heartrate has not been properly calculated. We missed the check for SchedulerJob type and setting heartrate value from `scheduler_health_check_threshold`. This PR fixes it. Fix: apache#37971
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.6.3
What happened?
Scheduler uses [scheduler]job_heartbeat_sec property for heartbeats instead of [scheduler]scheduler_heartbeat_sec.
In Airflow 2.5.3 [scheduler]scheduler_heartbeat_sec is used correctly.
The bug was introduced after refactoring of scheduler_job -> scheduler_job_runner.
This field is never used in the code:
airflow/airflow/jobs/scheduler_job_runner.py
Line 153 in 1f5960a
What you think should happen instead?
No response
How to reproduce
Operating System
Ubuntu
Versions of Apache Airflow Providers
No response
Deployment
Google Cloud Composer
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: