Skip to content

Conversation

@Taragolis
Copy link
Contributor

Continuation #24246

This base operators and sensors might help to resolve inconsistency in multiple operators and sensors:

  1. Some of current sensors/operators accept only aws_conn_id
  2. Some of current sensors/operators accept region instead of region_name

In additional this implementation propagate additional parameters into the hook:

  1. Operator/Sensor botocore_config -> Hook config
  2. Operator/Sensor verify -> Hook verify

Note

AwsBaseSensor created by "old methodic"

  1. Copy code from AwsBaseOperator
  2. Paste in new module

I've tried to create Mixin class however it not works well breaks something in BaseOperator
If someone know how to implements it by DRY, without use additional Metaclasses and/or class decorators feel free to suggest.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@Taragolis Taragolis requested review from ferruzzi and vincbeck October 5, 2023 19:16
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Oct 5, 2023
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also might be a good idea to make this classes internal only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And wondering about module naming because I a bit confused:

Suggestion in old PR use base_aws: #24246 (comment)
Naming for Base Trigger: airflow.providers.amazon.aws.triggers.base.AwsBaseWaiterTrigger

cc @ferruzzi @vincbeck @o-nikolas

There are only two hard things in Computer Science: cache invalidation and naming things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, naming is hard xD. I am fine with base_aws.py. It could be also:

  • base_operator.py
  • base.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also might be a good idea to make this classes internal only.

+1

Comment on lines +88 to +86
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes also required to make changes in airflow.providers.amazon.aws.triggers.base.AwsBaseWaiterTrigger as follow-up.

That is also why I make accept botocore_config as dictionary and not botocore.config.Config object

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions. We've discussed an AwsBaseOperator a few times and kept putting it off, there may be some more suggestions coming once I check my notes on what I was going to put into it.

As always, thanks for your thought and effort on the provider package contribution. 👍

Comment on lines +64 to +65
Copy link
Contributor

@ferruzzi ferruzzi Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is repeating itself, but maybe I'm missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the default AWS Connection id aws_default (as long as it is created and remains untouched by the user) also uses the default boto3 behaviour. Might be worth mentioning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could try to rewrite it to something shorter with reference to AWS Connection, like we have in other providers, e.g. Slack

:param slack_webhook_conn_id: :ref:`Slack Incoming Webhook <howto/connection:slack>`
connection id that has Incoming Webhook token in the password field.

Example in Auto API:
https://airflow.apache.org/docs/apache-airflow-providers-slack/8.1.0/_api/airflow/providers/slack/operators/slack_webhook/index.html

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my comments have all been addressed. Thanks.

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume followup PR would be to change AWS operators/hooks to inherit from the base classes?

We need also doc update to explain how to set the botocore parameters

Comment on lines 406 to 408
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to move these to the BASE_CLASSES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit blind 🤣

@Taragolis
Copy link
Contributor Author

I assume followup PR would be to change AWS operators/hooks to inherit from the base classes?

Yep

We need also doc update to explain how to set the botocore parameters

I think it would be a good idea to add generic information here: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/index.html as soon as we change existed Operators, so them will have same sets of generic parameters:

  • aws_conn_id: Operators which intends to use AwsBaseHook already have
  • region_name: Some of operators use this parameter
  • verify: Only one use right now
  • botocore_config: None of of the operators use it right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, naming is hard xD. I am fine with base_aws.py. It could be also:

  • base_operator.py
  • base.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity. Why from None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for skip previous traceback, e.g. if developer set aws_hook_class then it failed with TypeError on issubclass and traceback will contain information about error which raised from issubclass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to research that one too, I have never seen that in use. It's kinda handy.

@Taragolis Taragolis force-pushed the amazon-base-operator-sensors branch from d38cde3 to 63052fa Compare October 10, 2023 09:14
@Taragolis Taragolis force-pushed the amazon-base-operator-sensors branch from 63052fa to 126a1f5 Compare October 10, 2023 13:09
@Taragolis
Copy link
Contributor Author

In 126a1f5 split logic into two classes, for reduce repeatable code in AwsBaseOperator and AwsBaseSensor and in additional this might help to use in other places, e.g. Notifiers. Potentially it also could be use in AwsBaseWaiterTrigger however it use hook as a method, rather than property

Comment on lines +31 to +32
Only for internal usage, this class might be changed, renamed or removed in the future
without any further notice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this, should we change the class name to _AwsBaseOperator?

Same for the Sensor and mixin classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is kind of tradeoff between common Python's agreement about internal variables/classes/methods and airflow public models. _ shows that is internal stuff however IDE doesn't like this imports even with the same package (I guess static checks is fine) in the other hand we have some methods and class in core which do no prefixed by _ but have a description and :meta private:.

There is also maybe a good idea to make this stuff public after a while for create custom operators, but for now I would rather keep it internal only

In the end no one could stop end users to use this stuff directly

Comment on lines +64 to +65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the default AWS Connection id aws_default (as long as it is created and remains untouched by the user) also uses the default boto3 behaviour. Might be worth mentioning.

Co-authored-by: Niko Oliveira <onikolas@amazon.com>
Copy link
Contributor

@syedahsn syedahsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'd be interested to see how much this affects new and existing operators and sensors.

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants