-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Closed
Labels
area:MetaDBMeta Database related issues.Meta Database related issues.area:Schedulerincluding HA (high availability) schedulerincluding HA (high availability) schedulerarea:coregood first issuekind:bugThis is a clearly a bugThis is a clearly a bug
Description
Apache Airflow version
2.10.2
If "Other Airflow 2 version" selected, which one?
No response
What happened?
Scheduler was running and launching tasks normally.
Suddenly there was auth error on database operations.
psycopg2.OperationalError: connection to server at "<Host>" (<IP>), port 6432 failed: FATAL: server login has been failing, try again later (server_login_retry)
connection to server at "<HOST>" (<IP>), port 6432 failed: FATAL: server login has been failing, try again later (server_login_retry)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 984, in _execute
self._run_scheduler_loop()
After few retries it exited scheduler loop but process was not terminated.
What you think should happen instead?
After shutting down all executor and dag_processer process should exit.
How to reproduce
Using hybrid executors with Celery, Kubernetes
Introduce db errors.
Operating System
Mac/Linux
Versions of Apache Airflow Providers
No response
Deployment
Other Docker-based deployment
Deployment details
No response
Anything else?
There are below logs repeated which indicates some threads not exited.
[2024-10-27T06:17:10.658+0000] {kubernetes_executor_utils.py:101} INFO - Kubernetes watch timed out waiting for events. Restarting watch.
[2024-10-27T06:17:11.658+0000] {kubernetes_executor_utils.py:140} INFO - Event: and now my watch begins starting at resource_version: 0
[2024-10-27T06:17:11.702+0000] {kubernetes_executor_utils.py:309} INFO - Event: 666aac59b268675b6b2590ff-bs-8ace-s4sjuxfo is Running, annotations: <omitted>
[2024-10-27T06:17:11.712+0000] {kubernetes_executor_utils.py:309} INFO - Event: 666aac59b268675b6b2590ff-bs-44fe-iwuzjfao is Running, annotations: <omitted>
[2024-10-27T06:17:41.715+0000] {kubernetes_executor_utils.py:101} INFO - Kubernetes watch timed out waiting for events. Restarting watch.
I see old PR for similar issue #28685
Should I change catch block to catch all exceptions?
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area:MetaDBMeta Database related issues.Meta Database related issues.area:Schedulerincluding HA (high availability) schedulerincluding HA (high availability) schedulerarea:coregood first issuekind:bugThis is a clearly a bugThis is a clearly a bug