-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Adjusted the EMRServerlessStartJobOperator to cancel failed jobs #51883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vincbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.
I'd like to hear more thoughts on that from others.
providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py
Outdated
Show resolved
Hide resolved
…rlessStartJobTrigger
Apologies if there is an obvious answer, but is there a use case where you would want the job to not be cancelled in EMR if a new one is created due to retries, now running / pending concurrently? |
Hard to know all the different user use cases but I think you're correct, I do not see any, so I am probably wrong in my perception :) |
That’s true, there’s definetly a point in letting the user decide. As you said lets wait on other opinions :) |
|
@o-nikolas What is your take on this? Having thought about it again, this would change the standard behavior that also comes with other AWS operators (e. g. EMR cancels the job on failure, Glue doesn't), speaking more for using |
It would be a change of standard behaviour and also be a breaking change (the behaviour that the user sees will be noticeably different), so if we went that route we'd need to do a deprecation process. We could argue that it's a bug fix (as you describe, no one would really want the default behaviour we have) and then that would allow us to not have to go through the deprecation process. Or as Vincent said, we could avoid all that and just document a way around this with callbacks. I personally don't feel too strongly about it and would be okay with either of the three above. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
Hey @dominikhei, I might have changed my mind and I think your changes make sense. I agree with you, there is more chances that a user wants their job to be cancelled if the timer times out than not. Are you still around and if so, would you be interested to continue working on this PR? |
|
@vincbeck Took some time off due to a new job but wanted to get back into contributing regularly so this might be a good start :) |
|
Please do :) I reopen this PR |
|
But please rebase your PR so that it uses the latest up to date code |
|
Are you still planning to work on it? |
Yes, however I can not start before this Sunday, should have mentioned that beforehand, sorry. |
|
No rush at all :) I was just checking :) |
|
@vincbeck @o-nikolas Would you still consider this a breaking change? |
|
I think we can consider it as bug fix |
…che#51883) * Adjusted the EMRServerlessStartJobOperator to cancel submited jobs on failure * Removed hook.cancel_job_run and adjusted the return value of EmrServerlessStartJobTrigger * Added additional tests for the job cancellation behavior * Fixed ruff formating errors
closes: #42401
I have introduced a
cancel_jobmethod to the EMRServerlessHook, which wraps the cancel_job_run method from boto3.In cases of a non deferrable job run, if an Exception that waiter_max_attempts has been reached is thrown,
cancel_jobis executed. If deferrable is set to True, the cancellation logic is placed insideexecute_complete, as this method evaluates the job state in this case.