check job_status before BatchOperator execute in deferrable mode #36523

Lee-W · 2024-01-02T08:11:53Z

While running a batch job in deferrable mode, the condition might already be met before we defer the task into the trigger. This PR intends to check the batch status before deferring the task to trigger.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

dirrao

LGTM.

vincbeck · 2024-01-02T19:06:59Z

If the condition is already met and the task is deferred, what happens? My understanding is the task gets executed successfully. So the gain here is just time? Instead of deferring the task for nothing (because the condition is already met), we return it directly. Unless there is a very good reason to do so I am not really in favor of this change because it basically duplicates what the trigger is already doing but in the operator itself.

If we go down that path, I guess this logic can apply to any deferrable operator, and as such, such logic should be copied over to all deferrable operators

Lee-W · 2024-01-03T00:20:17Z

If the condition is already met and the task is deferred, what happens? My understanding is the task gets executed successfully. So the gain here is just time? Instead of deferring the task for nothing (because the condition is already met), we return it directly. Unless there is a very good reason to do so I am not really in favor of this change because it basically duplicates what the trigger is already doing but in the operator itself.

Yes, I think the main gain here is time as we avoid unnecessary serialization and deserialization. Another thing is for consistency. We already have this behavior for some of the operators

airflow/airflow/providers/amazon/aws/sensors/emr.py

Lines 647 to 658 in 223a984

    
           elif not self.poke(context): 
        
               self.defer( 
        
                   timeout=timedelta(seconds=self.max_attempts * self.poke_interval), 
        
                   trigger=EmrStepSensorTrigger( 
        
                       job_flow_id=self.job_flow_id, 
        
                       step_id=self.step_id, 
        
                       waiter_delay=int(self.poke_interval), 
        
                       waiter_max_attempts=self.max_attempts, 
        
                       aws_conn_id=self.aws_conn_id, 
        
                   ), 
        
                   method_name="execute_complete", 
        
               )

airflow/airflow/providers/amazon/aws/operators/emr.py

Lines 578 to 597 in 223a984

    
           if self.deferrable: 
        
               query_status = self.hook.check_query_status(job_id=self.job_id) 
        
               self.check_failure(query_status) 
        
               if query_status in EmrContainerHook.SUCCESS_STATES: 
        
                   return self.job_id 
        
               timeout = ( 
        
                   timedelta(seconds=self.max_polling_attempts * self.poll_interval) 
        
                   if self.max_polling_attempts 
        
                   else self.execution_timeout 
        
               ) 
        
               self.defer( 
        
                   timeout=timeout, 
        
                   trigger=EmrContainerTrigger( 
        
                       virtual_cluster_id=self.virtual_cluster_id, 
        
                       job_id=self.job_id, 
        
                       aws_conn_id=self.aws_conn_id, 
        
                       waiter_delay=self.poll_interval, 
        
                   ), 
        
                   method_name="execute_complete", 
        
               )

If we go down that path, I guess this logic can apply to any deferrable operator, and as such, such logic should be copied over to all deferrable operators

Yes, I think that's something we should do eventually

vincbeck · 2024-01-03T14:35:56Z

If the condition is already met and the task is deferred, what happens? My understanding is the task gets executed successfully. So the gain here is just time? Instead of deferring the task for nothing (because the condition is already met), we return it directly. Unless there is a very good reason to do so I am not really in favor of this change because it basically duplicates what the trigger is already doing but in the operator itself.

Yes, I think the main gain here is time as we avoid unnecessary serialization and deserialization. Another thing is for consistency. We already have this behavior for some of the operators

airflow/airflow/providers/amazon/aws/sensors/emr.py

Lines 647 to 658 in 223a984

elif not self.poke(context):

self.defer(

timeout=timedelta(seconds=self.max_attempts * self.poke_interval),

trigger=EmrStepSensorTrigger(

job_flow_id=self.job_flow_id,

step_id=self.step_id,

waiter_delay=int(self.poke_interval),

waiter_max_attempts=self.max_attempts,

aws_conn_id=self.aws_conn_id,

),

method_name="execute_complete",

)

airflow/airflow/providers/amazon/aws/operators/emr.py

Lines 578 to 597 in 223a984

if self.deferrable:

query_status = self.hook.check_query_status(job_id=self.job_id)

self.check_failure(query_status)

if query_status in EmrContainerHook.SUCCESS_STATES:

return self.job_id

timeout = (

timedelta(seconds=self.max_polling_attempts * self.poll_interval)

if self.max_polling_attempts

else self.execution_timeout

)

self.defer(

timeout=timeout,

trigger=EmrContainerTrigger(

virtual_cluster_id=self.virtual_cluster_id,

job_id=self.job_id,

aws_conn_id=self.aws_conn_id,

waiter_delay=self.poll_interval,

),

method_name="execute_complete",

)

If we go down that path, I guess this logic can apply to any deferrable operator, and as such, such logic should be copied over to all deferrable operators

Yes, I think that's something we should do eventually

I see, I dont necessarily agree but I'll wait others to comment, I might be the only grumpy one here :)

potiuk · 2024-01-03T14:42:19Z

I see, I dont necessarily agree but I'll wait others to comment, I might be the only grumpy one here :)

Just a little grumpy, I think :)

Yes, I think that's something we should do eventually

Yes I agree we might want to do it. Eventually, It's an optimisation and we should continue doing it. We've had similar cases in the past - for example when we optimized EmptyOperator. It's hard to have an enforceable rule here though, so I'd say ad-hoc optimisation like this one is the best approach.

o-nikolas

I also think this is a bit redundant. I like the time savings, but it complicates the code a bit to have result checking/hanlding in multiple places. I think if we move forward with this we should do our best to abstract that stuff into functions that are used in both the defer and non-defer paths.

airflow/providers/amazon/aws/operators/batch.py

… in deferrable mode

…e-in-deferrable-mode

phanikumv · 2024-01-10T12:38:08Z

I also think this is a bit redundant. I like the time savings, but it complicates the code a bit to have result checking/hanlding in multiple places. I think if we move forward with this we should do our best to abstract that stuff into functions that are used in both the defer and non-defer paths.

Merging this for now. @Lee-W can you implement the abstraction part in a separate PR

Lee-W requested review from eladkal and o-nikolas as code owners January 2, 2024 08:11

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jan 2, 2024

Lee-W mentioned this pull request Jan 2, 2024

Compare astronomer-providers and oss airflow operators/sensors and scope out deprecation plan astronomer/astronomer-providers#1377

Closed

Lee-W force-pushed the check-job-status-before-BatchOperator-execute-in-deferrable-mode branch from 59813da to 8713e75 Compare January 2, 2024 09:51

dirrao approved these changes Jan 2, 2024

View reviewed changes

Lee-W force-pushed the check-job-status-before-BatchOperator-execute-in-deferrable-mode branch 3 times, most recently from e8363cb to 72100ef Compare January 2, 2024 15:29

vincbeck mentioned this pull request Jan 2, 2024

Check redshift cluster state before deferring to triggerer #36416

Merged

phanikumv mentioned this pull request Jan 3, 2024

Deprecate amazon async provider astronomer/astronomer-providers#1410

Closed

19 tasks

potiuk approved these changes Jan 3, 2024

View reviewed changes

o-nikolas reviewed Jan 3, 2024

View reviewed changes

airflow/providers/amazon/aws/operators/batch.py Outdated Show resolved Hide resolved

Lee-W force-pushed the check-job-status-before-BatchOperator-execute-in-deferrable-mode branch 3 times, most recently from 3d4e215 to b0288a1 Compare January 6, 2024 15:08

Lee-W added 2 commits January 8, 2024 17:55

feat(providers/amazon): check job_status before BatchOperator execute…

896c1c0

… in deferrable mode

refactor(providers/amazon): improve log message consistency

308789a

Lee-W force-pushed the check-job-status-before-BatchOperator-execute-in-deferrable-mode branch from b0288a1 to 308789a Compare January 8, 2024 09:55

Lee-W mentioned this pull request Jan 9, 2024

check ProcessingJobStatus status before deferring SageMakerProcessingOperator #36658

Merged

Merge branch 'main' into check-job-status-before-BatchOperator-execut…

85bce0c

…e-in-deferrable-mode

phanikumv approved these changes Jan 10, 2024

View reviewed changes

phanikumv merged commit 88c9596 into apache:main Jan 10, 2024

phanikumv deleted the check-job-status-before-BatchOperator-execute-in-deferrable-mode branch January 10, 2024 12:38

eladkal mentioned this pull request Jan 22, 2024

Status of testing Providers that were prepared on January 26, 2024 #36948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check job_status before BatchOperator execute in deferrable mode #36523

check job_status before BatchOperator execute in deferrable mode #36523

Uh oh!

Lee-W commented Jan 2, 2024 •

edited

Loading

Uh oh!

dirrao left a comment

Uh oh!

vincbeck commented Jan 2, 2024 •

edited

Loading

Uh oh!

Lee-W commented Jan 3, 2024 •

edited

Loading

Uh oh!

vincbeck commented Jan 3, 2024

Uh oh!

potiuk commented Jan 3, 2024

Uh oh!

o-nikolas left a comment

Uh oh!

Uh oh!

phanikumv commented Jan 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

check job_status before BatchOperator execute in deferrable mode #36523

check job_status before BatchOperator execute in deferrable mode #36523

Uh oh!

Conversation

Lee-W commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dirrao left a comment

Choose a reason for hiding this comment

Uh oh!

vincbeck commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lee-W commented Jan 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincbeck commented Jan 3, 2024

Uh oh!

potiuk commented Jan 3, 2024

Uh oh!

o-nikolas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

phanikumv commented Jan 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Lee-W commented Jan 2, 2024 •

edited

Loading

vincbeck commented Jan 2, 2024 •

edited

Loading

Lee-W commented Jan 3, 2024 •

edited

Loading