EcsRunTaskOperator leaks ECS task on failure with partial IAM permissions

### Apache Airflow Provider(s)

amazon

### Versions of Apache Airflow Providers

`apache-airflow-providers-amazon==9.20.0`

### Apache Airflow version

main

### Operating System

Debian GNU/Linux 12 (bookworm)

### Deployment

Other

### Deployment details

_No response_

### What happened

When using `EcsRunTaskOperator`, an ECS task may be successfully started even when the task execution role has partial ECS permissions, for example lacking `ecs:DescribeTasks`.

In this scenario, the operator successfully calls `RunTask` and starts an ECS task. However, subsequent calls (such as waiting for task completion when `wait_for_completion=True`) fail due to insufficient permissions. The task then fails, but the ECS task continues running in AWS, resulting in leaked infrastructure.

This can occur, for example, when the execution role allows `ecs:RunTask` but denies `ecs:DescribeTasks`, which is required by the waiter used to monitor task completion.

### What you think should happen instead

If the operator fails after successfully starting an ECS task (for example due to missing `DescribeTasks` or other follow-up permissions), it should make a best-effort attempt to clean up the partially created resource by stopping the ECS task.

Cleanup should be attempted opportunistically (i.e. only if the task ARN is known and the necessary permissions are available), and failure to clean up should not mask or replace the original exception.

### How to reproduce

1. Create an IAM role that **allows** `ecs:RunTask` but **denies** `ecs:DescribeTasks`.

2. Configure an AWS connection in Airflow using this role.
   (The connection ID `aws_test_conn` is used for this reproduction.)

3. Create an ECS cluster.
   (The cluster name `airflow-partial-auth-cluster` is used for this reproduction.)

4. Create an ECS task definition with at least one container.
   (The task definition `my-task-def` with container `my-task-def-container` is used for this reproduction.)

5. Use the following DAG:

```python
from datetime import datetime

from airflow import DAG
from airflow.providers.amazon.aws.operators.ecs import EcsRunTaskOperator


with DAG(
    dag_id="ecs_partial_auth_leak_repro",
    start_date=datetime(2025, 1, 1),
    schedule=None,
    catchup=False,
) as dag:
    run_task = EcsRunTaskOperator(
        task_id="run_task",
        aws_conn_id="aws_test_conn",
        cluster="airflow-partial-auth-cluster",
        task_definition="my-task-def:1",
        launch_type="FARGATE",
        overrides={
            "containerOverrides": [
                {
                    "name": "my-task-def-container",
                }
            ]
        },
        network_configuration={
            "awsvpcConfiguration": {
                "subnets": ["subnet-xxxxxxxxxxxxxxxxx"],
                "assignPublicIp": "ENABLED",
            }
        },
        wait_for_completion=True,  # triggers DescribeTasks via waiter
    )
```

6. Trigger the DAG.

**Expected Result**

The task fails due to missing `ecs:DescribeTasks` permissions, but the ECS task continues running in AWS and is not stopped automatically.

### Anything else

This behavior can be surprising and potentially costly, as infrastructure is created even though the Airflow task fails. Other Airflow operators that manage external resources already attempt best-effort cleanup on failure to avoid leaking infrastructure.

In particular, this pattern has recently been introduced for both EC2 and EMR operators. The EC2 change has already been merged (PR #60904), and a corresponding EMR change has been proposed (PR #61010). Given this precedent, similar best-effort cleanup behavior would be expected for ECS operators as well.

### Are you willing to submit PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EcsRunTaskOperator leaks ECS task on failure with partial IAM permissions #61050

Apache Airflow Provider(s)

Versions of Apache Airflow Providers

Apache Airflow version

Operating System

Deployment

Deployment details

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EcsRunTaskOperator leaks ECS task on failure with partial IAM permissions #61050

Description

Apache Airflow Provider(s)

Versions of Apache Airflow Providers

Apache Airflow version

Operating System

Deployment

Deployment details

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions