Skip to content

ERROR - Unknown error in KubernetesJobWatcher. Failing #12229

@gakhrejah

Description

@gakhrejah

Hi Team,

We are getting below error Logs while running the Apache Airflow On AWS EKS .
All the Pods(Tasks) are in completed state but not removed by Airflow. I had to do manual restart of scheduler it everything works for 2-3 days. Then again all the tasks are stuck .

ERROR LOGS
[2020-11-10 07:00:07,752] {{kubernetes_executor.py:447}} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
[2020-11-10 07:00:07,765] {{kubernetes_executor.py:351}} INFO - Event: and now my watch begins starting at resource_version: 107544455
[2020-11-10 07:00:07,782] {{kubernetes_executor.py:342}} ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 340, in run
self.worker_uuid, self.kube_config)
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 364, in _run
**kwargs):
File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 177, in stream
status=obj['code'], reason=reason)
kubernetes.client.exceptions.ApiException: (410)
Reason: Gone: too old resource version: 107544455 (108550177)

Process KubernetesJobWatcher-135237:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 340, in run
self.worker_uuid, self.kube_config)
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 364, in _run
**kwargs):
File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 177, in stream
status=obj['code'], reason=reason)
kubernetes.client.exceptions.ApiException: (410)
Reason: Gone: too old resource version: 107544455 (108550177)

AIRFLOW_VERSION=1.10.9
ENVIRONMENT: QA| PROD
Docker Image : python:3.7-slim-buster

Please let us know if you require any more information and how we can resolve this issue . We have also tried to upgrade the AIRFLOW version to 1.10.10 but no luck.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions