-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
main branch
Apache Airflow version
main (development)
Operating System
any
Deployment
Other
Deployment details
No response
What happened
I'm investigate amazon-provider and found that different operators/sensors and other components use different approach to do the same things
Operators/Sensors set hook during initialise (__init__)
At that moment Operators/Sensors uses 4 different approach to get hook:
- Set in
__init__- which could cost use additional resources of scheduler/dag-processor - Set empty hook in during initialise and set by specific method (usual
get_hook) - Cached property
hookor similar - Define in
execute/pokemethod
I think we should avoid 1 and 2
List of components:
airflow.airflow.providers.amazon.aws.operators.batch.BatchOperator- set during operator initialiseairflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperator- set None during operator initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.operators.ecs.EcsOperator- set None during operator initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator- set during operator initialiseairflow.airflow.providers.amazon.aws.sensors.batch.BatchSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.glue_catalog_partition.GlueCatalogPartitionSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensor- attributesquicksight_hookandsts_hookdoesn't useairflow.airflow.providers.amazon.aws.operators.rds.RdsBaseSensor- set during sensor initialiseairflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.s3.S3KeySensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensor- set None during sensor initialise, init hook byget_hookairflow.airflow.providers.amazon.aws.sensors.step_function.StepFunctionExecutionSensor- set None during sensor initialise, init hook byget_hook
region vs region_name attribute
AWSBaseHook expected region_name however some operator/sensors uses region.
For consistency better rename to region_name with mark region as deprecated field
List of components:
airflow.airflow.providers.amazon.aws.operators.eks.EksCreateClusterOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksCreateNodegroupOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksCreateFargateProfileOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksDeleteClusterOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksDeleteNodegroupOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksDeleteFargateProfileOperatorairflow.airflow.providers.amazon.aws.operators.eks.EksPodOperatorairflow.airflow.providers.amazon.aws.operators.redshift_data.RedshiftDataOperatorairflow.airflow.providers.amazon.aws.operators.quicksight.QuickSightCreateIngestionOperatorairflow.airflow.providers.amazon.aws.sensors.eks.EksClusterStateSensorairflow.airflow.providers.amazon.aws.sensors.eks.EksFargateProfileStateSensorairflow.airflow.providers.amazon.aws.sensors.eks.EksNodegroupStateSensor
No explicit set region_name
Some components use region_name from connection, and doesn't have parameter/argument region_name
Note: At that moment only glacier component, and some S3 operations non-regional, however even for this components better set region_name
List of components:
airflow.airflow.providers.amazon.aws.operators.athena.AthenaOperatorairflow.airflow.providers.amazon.aws.operators.aws_lambda.AwsLambdaInvokeFunctionOperatorairflow.airflow.providers.amazon.aws.operators.athena.CloudFormationCreateStackOperatorairflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperatorairflow.airflow.providers.amazon.aws.operators.dms.DmsCreateTaskOperatorairflow.airflow.providers.amazon.aws.operators.dms.DmsDescribeTasksOperatorairflow.airflow.providers.amazon.aws.operators.dms.DmsStartTaskOperatorairflow.airflow.providers.amazon.aws.operators.dms.DmsStopTaskOperatorairflow.airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperatorairflow.airflow.providers.amazon.aws.operators.emr.EmrContainerOperatorairflow.airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperatorairflow.airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperatorairflow.airflow.providers.amazon.aws.operators.glue_crawler.GlueCrawlerOperatorairflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator- and all dependenciesairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftCreateClusterOperatorairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftResumeClusterOperatorairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftPauseClusterOperatorairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftDeleteClusterOperatorairflow.airflow.providers.amazon.aws.operators.redshift_sql.RedshiftSQLOperatorairflow.airflow.providers.amazon.aws.operators.s3.S3DeleteBucketOperatorairflow.airflow.providers.amazon.aws.operators.sagemaker.SageMakerBaseOperator- and all dependenciesairflow.airflow.providers.amazon.aws.operators.sns.SnsPublishOperatorairflow.airflow.providers.amazon.aws.operators.sqs.SqsPublishOperatorairflow.airflow.providers.amazon.aws.operators.sqs.StepFunctionStartExecutionOperatorairflow.airflow.providers.amazon.aws.operators.step_function.StepFunctionStartExecutionOperatorairflow.airflow.providers.amazon.aws.sensors.athena.AthenaSensorairflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationCreateStackSensor- missing docstringairflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationDeleteStackSensor- missing docstringairflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensorairflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor- and all dependenciesairflow.airflow.providers.amazon.aws.sensors.glue.GlacierJobOperationSensorairflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensorairflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensorairflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensorairflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor- and all dependenciesairflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensorairflow.airflow.providers.amazon.aws.sensors.sqs.StepFunctionExecutionSensorairflow.airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operatorairflow.airflow.providers.amazon.aws.transfers.hive_to_dynamodb.HiveToDynamoDBOperatorairflow.airflow.providers.amazon.aws.transfers.redshift_to_s3.RedshiftToS3Operator- redshift_region_name ???
What you think should happen instead
Try to make some generic stuff by the same way It may help to changes/contributions in the futures.
How to reproduce
No response
Anything else
I do not create PR just because in single PR it will affect almost all sensors/operators
IMHO, It is better to implement in parts
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct