-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version
3.0.0
If "Other Airflow 2 version" selected, which one?
No response
What happened?
At high concurrency, the task supervisor process is not able to cleanly mark the sockets closed once the task process has finished. As a result the task supervisor process runs forever. We noticed that in such cases the task supervisor thinks that there are 1 or 2 open sockets, but actually the task process has finished and there are no open sockets in the machine.
What you think should happen instead?
Task supervisor process should be able to correctly catch all the socket closing event and mark them as closed, or defensively close all sockets once the task process has finished and x amount of configurable time has passed.
How to reproduce
Run airflow worker using the task sdk at high concurrency (>1000 tasks per min), and after a while of running it, we would notice that the task supervisor process being left around with no actual task process for that supervisor process (via ps -ef)
Operating System
Debian GNU/Linux 12
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else?
Only a fraction of task supervisor process is left running with no task process (because the supervisor thinks there are open sockets which is not true based on the machine state), maybe this could be linked to task process heartbeat failures, but no confirmation on co-relation.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status