The Celery Worker Pods terminated and lost task logs when scale down happened during HPA autoscaling #41163

andycai-sift · 2024-07-31T22:24:45Z

Official Helm Chart version

1.15.0 (latest released)

Apache Airflow version

2.7.1

Kubernetes Version

1.27.2

Helm Chart configuration

After official helm charts supports HPA, then here is my HPA settings.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: "airflow-worker"
  namespace: airflow
  labels:
    app: airflow
    component: worker
    release: airflow
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: "airflow-worker"
  minReplicas: 2
  maxReplicas:16
  metrics:
    - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 70
    - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Docker Image customizations

Not too much special packages, just added openjdk, mongo client and some other python related dependency.

What happened

When I enabled HPA in the Airflow cluster, then during the scale down some long running tasks which might be over 15hrs got failed and lost the logs. More details.

The long running task for example is calling python api to spin up a google dataproc cluster which will run the job over 15hrs. When the HPA started scale down the pods, then these tasks failed looks due to the pod terminated after the terminationGracePeriodSeconds was hitting. And then it caused the task failed, but actually the tasks still waiting the final response status from dataproc jobs.
Also after the pod terminated, then the tasks' logs actually was not published to the GCS bucket. I set the remote log to GCS bucket already. But the logs actually are still in persistent disk side, I have to spin up the same pod then can upload the logs to GCS bucket. again.
The terminationGracePeriodSeconds I set is 1hour right now, but due to the long running tasks so i am not sure how the HPA settings can support it.

What you think should happen instead

It should be happened during HPA scaling down either the celery work pod need to graceful shutdown until all tasks are completed or Airflow should provide some config or something else to support kill the task process before final termination deadline and complete the cleanup and upload logs to remote settings like GCS bucket.

How to reproduce

Using official helm chart with HPA enabled with CPU/Memory metrics deploy airflow to Kubernetes.
Setup remote logging

logging:
    delete_local_logs: 'False'
    remote_base_log_folder: "gs://xxx"
    remote_logging: 'True'

Run a long running task maybe just need 30mins or few hours, but you have to make sure setup terminationGracePeriodSeconds maybe to a small number.
Try to trigger the HPA scale down the pods, then you should see the issue from the UI you should see the task failed without logs.

Anything else

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

boring-cyborg · 2024-07-31T22:24:48Z

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

dimon222 · 2024-08-10T18:27:18Z

Regarding first proposed solution - HPA waiting for completion of work:
I'm afraid this is not fixable by Airflow project, as it's the limitation of Kubernetes that during scale down operation request you can't choose the "wait for completion of work" dynamic amount of time. The timing for graceful shutdown has to be configured at container start time, you can't update this when the container of celery worker is already running (the resource spec is immutable, you have to trigger "deploy" style operation to bounce container to do any changes).

I have been waiting for this for years, but K8s development community isn't working on it currently.
kubernetes/enhancements#2255 (comment)

Not sure about second proposed solution though. I'm not certain is Kubernetes sends upfront signal to remind that shutdown signal will come in grace period amount of seconds.

andycai-sift added area:helm-chart Airflow Helm Chart kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jul 31, 2024

dosubot bot added area:logging area:providers provider:cncf-kubernetes Kubernetes provider related issues labels Jul 31, 2024

shahar1 assigned andycai-sift Aug 2, 2024

shahar1 removed the needs-triage label for new issues that we didn't triage yet label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Celery Worker Pods terminated and lost task logs when scale down happened during HPA autoscaling #41163

The Celery Worker Pods terminated and lost task logs when scale down happened during HPA autoscaling #41163

andycai-sift commented Jul 31, 2024

boring-cyborg bot commented Jul 31, 2024

dimon222 commented Aug 10, 2024 •

edited

Loading

The Celery Worker Pods terminated and lost task logs when scale down happened during HPA autoscaling #41163

The Celery Worker Pods terminated and lost task logs when scale down happened during HPA autoscaling #41163

Comments

andycai-sift commented Jul 31, 2024

Official Helm Chart version

Apache Airflow version

Kubernetes Version

Helm Chart configuration

Docker Image customizations

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

boring-cyborg bot commented Jul 31, 2024

dimon222 commented Aug 10, 2024 • edited Loading

dimon222 commented Aug 10, 2024 •

edited

Loading