-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version
3.1.5
If "Other Airflow 3 version" selected, which one?
No response
What happened?
For my workload, I ingest video files and then create tasks to process each one. It is normal to create 30+ tasks that will get processed in the span of 40+min. On backfill, it's absolutely normal for my GPU workers to not be available and for a task to wait 1h+ to be executed.
This is why I changed AIRFLOW__SCHEDULER__TASK_QUEUED_TIMEOUT to 10 000 (see image below).
I also changed my task definition to allow a task to be in processing for 24h, so this is not the issue either.
However, even when setting that, my tasks fail and the retry mechanism kicks in. I've bandaged fixed it by increasing the number of retries, but it's not optimal.
What you think should happen instead?
AIRFLOW__SCHEDULER__TASK_QUEUED_TIMEOUT should be taken into account, and tasks should not timeout on around 15min.
All tasks eventually complete. With 20 runs × 2.5 min each and 1 worker, the last task should wait ~50 min in the queue - well within the 100,000s timeout.
How to reproduce
1. Install Airflow
uv pip install "apache-airflow[celery]==3.1.5" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.1.5/constraints-3.12.txt"2. Initialize and configure
airflow db migrate
sed -i 's/^executor = .*/executor = CeleryExecutor/' airflow.cfg
sed -i 's/^task_queued_timeout = .*/task_queued_timeout = 100000.0/' airflow.cfg3. Create the DAG
Save as dags/long_running_task_dag.py:
from datetime import datetime, timedelta
from time import sleep
from airflow.sdk import dag, task
@dag(
dag_id="long_running_task_bug_repro",
schedule=None,
start_date=datetime(2025, 1, 1),
catchup=False,
default_args={
"owner": "airflow",
"retries": 0,
},
)
def long_running_task_bug_repro():
@task(
queue="gpu_queue",
execution_timeout=timedelta(hours=24),
)
def gpu_simulation():
"""Simulate GPU processing - sleeps for 2.5 minutes."""
sleep(150)
gpu_simulation()
long_running_task_bug_repro()4. Start services
Terminal 1 - Scheduler:
airflow api-server --port 8080 &
airflow scheduler &
airflow dag-processor &
airflow triggerer &
airflow celery worker --queues defaultTerminal 2 - Single worker (simulates limited GPU resources):
airflow celery worker --queues gpu_queue --concurrency 15. Trigger multiple DAG runs
for i in $(seq 1 20); do
airflow dags trigger long_running_task_bug_repro
done6. Wait ~15 minutes
Operating System
Ubuntu LTS 22.04
Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.14.0
apache-airflow-providers-common-compat==1.10.0
apache-airflow-providers-common-io==1.7.0
apache-airflow-providers-common-sql==1.30.0
apache-airflow-providers-smtp==2.4.0
apache-airflow-providers-standard==1.10.0
Deployment
Virtualenv installation
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct