-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Fix ecs/EcsRunTaskOperator retry condition #53080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,7 +19,11 @@ | |
|
|
||
| from typing import TYPE_CHECKING, Protocol, runtime_checkable | ||
|
|
||
| from airflow.providers.amazon.aws.exceptions import EcsOperatorError, EcsTaskFailToStart | ||
| from airflow.providers.amazon.aws.exceptions import ( | ||
| EcsCannotPullContainerError, | ||
| EcsOperatorError, | ||
| EcsTaskFailToStart, | ||
| ) | ||
| from airflow.providers.amazon.aws.hooks.base_aws import AwsGenericHook | ||
| from airflow.providers.amazon.aws.utils import _StringCompareEnum | ||
|
|
||
|
|
@@ -29,6 +33,9 @@ | |
|
|
||
| def should_retry(exception: Exception): | ||
| """Check if exception is related to ECS resource quota (CPU, MEM).""" | ||
| if isinstance(exception, EcsCannotPullContainerError): | ||
| return False | ||
|
|
||
| if isinstance(exception, EcsOperatorError): | ||
|
Comment on lines
34
to
39
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO, let the Based on the Documentation - CannotPullContainer task errors in Amazon ECS, it's more like configuration error from user instead of system instability. cc @o-nikolas , @eladkal
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there're cases suitable for retry with a reasonable wait time. e.g., ERROR: toomanyrequests: Too Many Requests or You have reached your pull rate limit. |
||
| return any( | ||
| quota_reason in failure["reason"] | ||
|
|
@@ -40,6 +47,9 @@ def should_retry(exception: Exception): | |
|
|
||
| def should_retry_eni(exception: Exception): | ||
| """Check if exception is related to ENI (Elastic Network Interfaces).""" | ||
| if isinstance(exception, EcsCannotPullContainerError): | ||
| return False | ||
|
|
||
| if isinstance(exception, EcsTaskFailToStart): | ||
| return any( | ||
| eni_reason in exception.message | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a tiny nit, but maybe adjust the docstring to incorporate the new behavior?