Adjusted the EMRServerlessStartJobOperator to cancel failed jobs #51883

dominikhei · 2025-06-18T12:12:49Z

I have introduced a cancel_job method to the EMRServerlessHook, which wraps the cancel_job_run method from boto3.

In cases of a non deferrable job run, if an Exception that waiter_max_attempts has been reached is thrown, cancel_job is executed. If deferrable is set to True, the cancellation logic is placed inside execute_complete, as this method evaluates the job state in this case.

… failure

vincbeck

I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.

I'd like to hear more thoughts on that from others.

providers/amazon/src/airflow/providers/amazon/aws/hooks/emr.py

providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py

…rlessStartJobTrigger

dominikhei · 2025-06-18T17:17:22Z

I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.

I'd like to hear more thoughts on that from others.

Apologies if there is an obvious answer, but is there a use case where you would want the job to not be cancelled in EMR if a new one is created due to retries, now running / pending concurrently?

vincbeck · 2025-06-18T17:21:51Z

I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.
I'd like to hear more thoughts on that from others.

Apologies if this is an obvious question, but is there a use case where you would want the job to not be cancelled in EMR if a new one is created due to retries, now running / pending concurrently?

Hard to know all the different user use cases but I think you're correct, I do not see any, so I am probably wrong in my perception :)

dominikhei · 2025-06-18T17:54:11Z

I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.
I'd like to hear more thoughts on that from others.

Apologies if this is an obvious question, but is there a use case where you would want the job to not be cancelled in EMR if a new one is created due to retries, now running / pending concurrently?

Hard to know all the different user use cases but I think you're correct, I do not see any, so I am probably wrong in my perception :)

That’s true, there’s definetly a point in letting the user decide. As you said lets wait on other opinions :)

dominikhei · 2025-07-02T11:12:16Z

@o-nikolas What is your take on this? Having thought about it again, this would change the standard behavior that also comes with other AWS operators (e. g. EMR cancels the job on failure, Glue doesn't), speaking more for using on_failure_callback.

o-nikolas · 2025-07-04T17:17:52Z

@o-nikolas What is your take on this? Having thought about it again, this would change the standard behavior that also comes with other AWS operators (e. g. EMR cancels the job on failure, Glue doesn't), speaking more for using on_failure_callback.

It would be a change of standard behaviour and also be a breaking change (the behaviour that the user sees will be noticeably different), so if we went that route we'd need to do a deprecation process. We could argue that it's a bug fix (as you describe, no one would really want the default behaviour we have) and then that would allow us to not have to go through the deprecation process. Or as Vincent said, we could avoid all that and just document a way around this with callbacks.

I personally don't feel too strongly about it and would be okay with either of the three above.

github-actions · 2025-08-19T00:17:59Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

vincbeck · 2025-11-27T17:51:51Z

Hey @dominikhei, I might have changed my mind and I think your changes make sense. I agree with you, there is more chances that a user wants their job to be cancelled if the timer times out than not. Are you still around and if so, would you be interested to continue working on this PR?

dominikhei · 2025-11-28T20:04:29Z

@vincbeck Took some time off due to a new job but wanted to get back into contributing regularly so this might be a good start :)

vincbeck · 2025-11-28T20:41:23Z

Please do :) I reopen this PR

vincbeck · 2025-11-28T20:41:47Z

But please rebase your PR so that it uses the latest up to date code

vincbeck · 2025-12-09T15:00:52Z

Are you still planning to work on it?

dominikhei · 2025-12-09T15:36:01Z

Are you still planning to work on it?

Yes, however I can not start before this Sunday, should have mentioned that beforehand, sorry.
If this timeline is too slow, please feel free to reassign it.

vincbeck · 2025-12-09T15:57:54Z

No rush at all :) I was just checking :)

dominikhei · 2026-01-10T09:20:47Z

@vincbeck @o-nikolas Would you still consider this a breaking change?

vincbeck · 2026-01-12T15:24:00Z

I think we can consider it as bug fix

…che#51883) * Adjusted the EMRServerlessStartJobOperator to cancel submited jobs on failure * Removed hook.cancel_job_run and adjusted the return value of EmrServerlessStartJobTrigger * Added additional tests for the job cancellation behavior * Fixed ruff formating errors

Adjusted the EMRServerlessStartJobOperator to cancel submited jobs on…

4a70c70

… failure

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jun 18, 2025

dominikhei marked this pull request as ready for review June 18, 2025 12:55

dominikhei requested review from eladkal and o-nikolas as code owners June 18, 2025 12:55

vincbeck requested changes Jun 18, 2025

View reviewed changes

providers/amazon/src/airflow/providers/amazon/aws/hooks/emr.py Outdated Show resolved Hide resolved

providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py Outdated Show resolved Hide resolved

Removed hook.cancel_job_run and adjusted the return value of EmrServe…

7703c44

…rlessStartJobTrigger

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 19, 2025

github-actions bot closed this Aug 25, 2025

vincbeck reopened this Nov 28, 2025

Merge branch 'main' into emr-cancel-job-run

b958433

Added additional tests for the job cancellation behavior

111c5e2

vincbeck approved these changes Jan 13, 2026

View reviewed changes

Fixed ruff formating errors

b6267be

vincbeck merged commit cbaa369 into apache:main Jan 14, 2026
89 checks passed

vincbeck mentioned this pull request Jan 14, 2026

Triggerer timeout exception is not handled properly #60517

Open

2 tasks

vincbeck mentioned this pull request Jan 28, 2026

Status of testing Providers that were prepared on January 28, 2026 #61165

Open

85 tasks

Adjusted the EMRServerlessStartJobOperator to cancel failed jobs #51883

Adjusted the EMRServerlessStartJobOperator to cancel failed jobs #51883

Conversation

dominikhei commented Jun 18, 2025

Uh oh!

vincbeck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dominikhei commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincbeck commented Jun 18, 2025

Uh oh!

dominikhei commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dominikhei commented Jul 2, 2025

Uh oh!

o-nikolas commented Jul 4, 2025

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

vincbeck commented Nov 27, 2025

Uh oh!

dominikhei commented Nov 28, 2025

Uh oh!

vincbeck commented Nov 28, 2025

Uh oh!

vincbeck commented Nov 28, 2025

Uh oh!

vincbeck commented Dec 9, 2025

Uh oh!

dominikhei commented Dec 9, 2025

Uh oh!

vincbeck commented Dec 9, 2025

Uh oh!

dominikhei commented Jan 10, 2026

Uh oh!

vincbeck commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dominikhei commented Jun 18, 2025 •

edited

Loading

dominikhei commented Jun 18, 2025 •

edited

Loading