-
Notifications
You must be signed in to change notification settings - Fork 16.4k
check job_status before BatchOperator execute in deferrable mode #36523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check job_status before BatchOperator execute in deferrable mode #36523
Conversation
59813da to
8713e75
Compare
dirrao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
e8363cb to
72100ef
Compare
|
If the condition is already met and the task is deferred, what happens? My understanding is the task gets executed successfully. So the gain here is just time? Instead of deferring the task for nothing (because the condition is already met), we return it directly. Unless there is a very good reason to do so I am not really in favor of this change because it basically duplicates what the trigger is already doing but in the operator itself. If we go down that path, I guess this logic can apply to any deferrable operator, and as such, such logic should be copied over to all deferrable operators |
Yes, I think the main gain here is time as we avoid unnecessary serialization and deserialization. Another thing is for consistency. We already have this behavior for some of the operators airflow/airflow/providers/amazon/aws/sensors/emr.py Lines 647 to 658 in 223a984
airflow/airflow/providers/amazon/aws/operators/emr.py Lines 578 to 597 in 223a984
Yes, I think that's something we should do eventually |
I see, I dont necessarily agree but I'll wait others to comment, I might be the only grumpy one here :) |
Just a little grumpy, I think :)
Yes I agree we might want to do it. Eventually, It's an optimisation and we should continue doing it. We've had similar cases in the past - for example when we optimized EmptyOperator. It's hard to have an enforceable rule here though, so I'd say ad-hoc optimisation like this one is the best approach. |
o-nikolas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this is a bit redundant. I like the time savings, but it complicates the code a bit to have result checking/hanlding in multiple places. I think if we move forward with this we should do our best to abstract that stuff into functions that are used in both the defer and non-defer paths.
3d4e215 to
b0288a1
Compare
b0288a1 to
308789a
Compare
…e-in-deferrable-mode
Merging this for now. @Lee-W can you implement the abstraction part in a separate PR |
While running a batch job in deferrable mode, the condition might already be met before we defer the task into the trigger. This PR intends to check the batch status before deferring the task to trigger.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.