-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
2.9.3
If "Other Airflow 2 version" selected, which one?
No response
What happened?
We have some DAGs that cannot run in parallel. To prevent parallel execution, we configured max_active_runs=1. We also configured retries. Recently, we observed a case where Airflow still scheduled two parallel DAG runs. We reconstructed what happened from the audit logs and can reliably reproduce it:
GIVEN a DAG with max_active_runs=1 and a task with retries > 0
WHEN the task is running in the context of run A
AND the user manually marks run A as failed (or success)
AND the user clears multiple runs including run A shortly afterwards
AND the scheduler starts the task in the context of another run B
THEN the task of run A is marked as "UP_FOR_RETRY" and restarts after backoff (5 minutes by default) regardless of whether another run is already active
What you think should happen instead?
- Airflow should not schedule two parallel runs when max_active_runs=1
- Airflow should not retry when the user marks run as failed/success and clears it shortly after
How to reproduce
See above. Using Kubernetes executor (or similar) is likely necessary to reproduce this, as it extends the time between the user action (mark as failed/success) and the retrieval of SIGTERM in the task instance. We also used a task that sleeps longer than the retry backoff (5m by default) to actually see the two runs running in parallel.
Operating System
debian 12 (bookworm)
Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==8.3.3
Deployment
Official Apache Airflow Helm Chart
Deployment details
Workload runs via kubernetes executor and kubernetes pod operator.
Anything else?
Rarely, but if it does, it causes severe problems as the DAG/task cannot run in parallel.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct