-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
Apache Airflow version:
2.3.3
Schedulers are racing for pod adoption and leads abrupt pod deletes when there is delay in schedulers heartbeats. However the schedulers are alive but not dead their heartbeat is delayed due to network timeout or heavy processing and etc.
What you think should happen instead
If schedulers heartbeat is delayed for genuine reason, then we shouldn't kill the running worker pods.
How to reproduce
Reduce the scheduler_health_check_threshold=5 and orphaned_tasks_check_interval=10 values in the airflow config file
Launch the airflow with two schedulers and try to schedule multiple DAGs with backfill for every 1/5 mins.
Operating System
NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
Versions of Apache Airflow Providers
No response
Deployment
Other Docker-based deployment
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct