Skip to content

The task supervisor continues running indefinitely, even after the associated task process has completed #50500

@neel-astro

Description

@neel-astro

Apache Airflow version

3.0.0

If "Other Airflow 2 version" selected, which one?

No response

What happened?

At high concurrency, the task supervisor process is not able to cleanly mark the sockets closed once the task process has finished. As a result the task supervisor process runs forever. We noticed that in such cases the task supervisor thinks that there are 1 or 2 open sockets, but actually the task process has finished and there are no open sockets in the machine.

What you think should happen instead?

Task supervisor process should be able to correctly catch all the socket closing event and mark them as closed, or defensively close all sockets once the task process has finished and x amount of configurable time has passed.

How to reproduce

Run airflow worker using the task sdk at high concurrency (>1000 tasks per min), and after a while of running it, we would notice that the task supervisor process being left around with no actual task process for that supervisor process (via ps -ef)

Operating System

Debian GNU/Linux 12

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else?

Only a fraction of task supervisor process is left running with no task process (because the supervisor thinks there are open sockets which is not true based on the machine state), maybe this could be linked to task process heartbeat failures, but no confirmation on co-relation.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions