Replies: 4 comments 3 replies
-
|
This is a great breakdown. We've been putting a lot of effort into standardizing things in the AWS package and this will definitely help. This may be out of scope, or you may want to include it here as well: Another similar standardizing change we have in mind is to convert as many of the hooks as possible into "thin hooks". If a hook method just passes through to a boto call, then there's really no reason to have that method defined and it can/should be removed. For a good example of what we'd like to move towards, see how the |
Beta Was this translation helpful? Give feedback.
-
|
I agree with Dennis. This is a very good deep dive and research. Thanks for the effort. One first good step would also be to write a README about guidance/convention we want to follow in terms of style, naming, ... At least we would ask any new PR to follow these standards and for the legacy one, we can fix that iteratively |
Beta Was this translation helpful? Give feedback.
-
|
I converted it into a discussion. I think you can make some good set of consistency improvements for the amazon provider @ferruzzi @vincbeck @Taragolis. I propose to make a series of PRs from those ideas :) |
Beta Was this translation helpful? Give feedback.
-
|
Hello, this inconsistency is causing a problem for me and I think this is the right thread to post it? Please redirect me if I'm wrong. EmrBaseSensor can set "aws_conn_id" to change the region, but this is not supported as a templated field. This means if I do not know my region (aws_conn_id) at runtime, then there is no way for me to set it for any of the Sensors that inherit from that base class (I assume the same problem exists for the RDS, Sagemaker base operators, and possibly more). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
main branch
Apache Airflow version
main (development)
Operating System
any
Deployment
Other
Deployment details
No response
What happened
I'm investigate amazon-provider and found that different operators/sensors and other components use different approach to do the same things
Operators/Sensors set hook during initialise (
__init__)At that moment Operators/Sensors uses 4 different approach to get hook:
__init__- which could cost use additional resources of scheduler/dag-processorget_hook)hookor similarexecute/pokemethodI think we should avoid 1 and 2
List of components:
airflow.airflow.providers.amazon.aws.operators.batch.BatchOperator- set during operator initialiseairflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperator- set None during operator initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.operators.ecs.EcsOperator- set None during operator initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator- set during operator initialiseairflow.airflow.providers.amazon.aws.sensors.batch.BatchSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.glue_catalog_partition.GlueCatalogPartitionSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensor- attributesquicksight_hookandsts_hookdoesn't useairflow.airflow.providers.amazon.aws.operators.rds.RdsBaseSensor- set during sensor initialiseairflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.s3.S3KeySensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.step_function.StepFunctionExecutionSensor- set None during sensor initialise, init hook byget_hookregionvsregion_nameattributeAWSBaseHook expected
region_namehowever some operator/sensors usesregion.For consistency better rename to
region_namewith markregionas deprecated fieldList of components:
airflow.airflow.providers.amazon.aws.operators.eks.EksCreateClusterOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksCreateNodegroupOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksCreateFargateProfileOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksDeleteClusterOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksDeleteNodegroupOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksDeleteFargateProfileOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksPodOperatorairflow.airflow.providers.amazon.aws.operators.redshift_data.RedshiftDataOperatorairflow.airflow.providers.amazon.aws.operators.quicksight.QuickSightCreateIngestionOperatorairflow.airflow.providers.amazon.aws.sensors.eks.EksClusterStateSensorairflow.airflow.providers.amazon.aws.sensors.eks.EksFargateProfileStateSensorairflow.airflow.providers.amazon.aws.sensors.eks.EksNodegroupStateSensorNo explicit set
region_nameSome components use region_name from connection, and doesn't have parameter/argument
region_nameNote: At that moment only glacier component, and some S3 operations non-regional, however even for this components better set region_name
List of components:
airflow.airflow.providers.amazon.aws.operators.athena.AthenaOperatorairflow.airflow.providers.amazon.aws.operators.aws_lambda.AwsLambdaInvokeFunctionOperatorairflow.airflow.providers.amazon.aws.operators.athena.CloudFormationCreateStackOperatorairflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperatorairflow.airflow.providers.amazon.aws.operators.dms.DmsCreateTaskOperatorairflow.airflow.providers.amazon.aws.operators.dms.DmsDescribeTasksOperatorairflow.airflow.providers.amazon.aws.operators.dms.DmsStartTaskOperatorairflow.airflow.providers.amazon.aws.operators.dms.DmsStopTaskOperatorairflow.airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperatorairflow.airflow.providers.amazon.aws.operators.emr.EmrContainerOperatorairflow.airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperatorairflow.airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperatorairflow.airflow.providers.amazon.aws.operators.glue_crawler.GlueCrawlerOperatorairflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator- and all dependenciesairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftCreateClusterOperatorairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftResumeClusterOperatorairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftPauseClusterOperatorairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftDeleteClusterOperatorairflow.airflow.providers.amazon.aws.operators.redshift_sql.RedshiftSQLOperatorairflow.airflow.providers.amazon.aws.operators.s3.S3DeleteBucketOperatorairflow.airflow.providers.amazon.aws.operators.sagemaker.SageMakerBaseOperator- and all dependenciesairflow.airflow.providers.amazon.aws.operators.sns.SnsPublishOperatorairflow.airflow.providers.amazon.aws.operators.sqs.SqsPublishOperatorairflow.airflow.providers.amazon.aws.operators.sqs.StepFunctionStartExecutionOperatorairflow.airflow.providers.amazon.aws.operators.step_function.StepFunctionStartExecutionOperatorairflow.airflow.providers.amazon.aws.sensors.athena.AthenaSensorairflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationCreateStackSensor- missing docstringairflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationDeleteStackSensor- missing docstringairflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensorairflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor- and all dependenciesairflow.airflow.providers.amazon.aws.sensors.glue.GlacierJobOperationSensorairflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensorairflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensorairflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensorairflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor- and all dependenciesairflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensorairflow.airflow.providers.amazon.aws.sensors.sqs.StepFunctionExecutionSensorairflow.airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operatorairflow.airflow.providers.amazon.aws.transfers.hive_to_dynamodb.HiveToDynamoDBOperatorairflow.airflow.providers.amazon.aws.transfers.redshift_to_s3.RedshiftToS3Operator- redshift_region_name ???What you think should happen instead
Try to make some generic stuff by the same way It may help to changes/contributions in the futures.
How to reproduce
No response
Anything else
I do not create PR just because in single PR it will affect almost all sensors/operators
IMHO, It is better to implement in parts
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions