-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Kubernetes Executor & Scheduler Resilient to error during PMH execution #27611
Make Kubernetes Executor & Scheduler Resilient to error during PMH execution #27611
Conversation
f80ed32
to
e46a992
Compare
…ecution (apache#27611) Earlier there is no try-catch logic when pod_mutation_hook is called in the Kubernetes Executor. So if there is any error/exception during the execution of the pod_mutation_hook user created, the whole executor/scheduler will crash. This change makes Kubernetes Executor & Scheduler more resilient to such errors during PMH execution
try: | ||
pod_mutation_hook(pod) | ||
except Exception as e: | ||
raise PodMutationHookException(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be raise PodMutationHookException() from e
? Otherwise it will be logged as an exception happening while handling another exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hterik , thanks for the feedback!
I tested the two different ways below. I find the difference is minor, and I personally prefer the current way where the detailed exception reason appears in the same line as the PodMutationHookException
, so it's clearer to the log reader (this is also the original reason for what I wrote it this way). WDYT?
Case -1
from airflow.exceptions import PodMutationHookException
try:
int("a")
except Exception as e:
raise PodMutationHookException(e)
We get
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: invalid literal for int() with base 10: 'a'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
airflow.exceptions.PodMutationHookException: invalid literal for int() with base 10: 'a'
Case -2
If we use the way you suggested
from airflow.exceptions import PodMutationHookException
try:
int("a")
except Exception as e:
raise PodMutationHookException() from e
We get
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: invalid literal for int() with base 10: 'a'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
airflow.exceptions.PodMutationHookException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, maybe raise PodMutationHookException(e) from e
in that case ;) That will have best of both worlds. But yeah, i think the difference is minor and i don't think it has any logical difference other than the log message in between the two tracebacks.
Currently there is no try-catch logic when
pod_mutation_hook
is called in the Kubernetes Executor. So if there is any error/exception during the execution of thepod_mutation_hook
user created, the whole executor/scheduler will crash.Such error may happen:
pod_mutation_hook
made an error inside.pod_mutation_hook
encounters transient error (e.g. it queries DB and the DB gives a timeout error).This PR aims to make Kubernetes Executor & Scheduler more resilient to such error during PMH execution (current behavior, directly crashing, is a bit scary)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.