Skip to content

amazon-provider: Make components more consistent #24030

@Taragolis

Description

@Taragolis

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

main branch

Apache Airflow version

main (development)

Operating System

any

Deployment

Other

Deployment details

No response

What happened

I'm investigate amazon-provider and found that different operators/sensors and other components use different approach to do the same things


Operators/Sensors set hook during initialise (__init__)

At that moment Operators/Sensors uses 4 different approach to get hook:

  1. Set in __init__ - which could cost use additional resources of scheduler/dag-processor
  2. Set empty hook in during initialise and set by specific method (usual get_hook)
  3. Cached property hook or similar
  4. Define in execute/poke method

I think we should avoid 1 and 2

List of components:

  • airflow.airflow.providers.amazon.aws.operators.batch.BatchOperator - set during operator initialise
  • airflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperator - set None during operator initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.operators.ecs.EcsOperator - set None during operator initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator - set during operator initialise
  • airflow.airflow.providers.amazon.aws.sensors.batch.BatchSensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.glue_catalog_partition.GlueCatalogPartitionSensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensor - attributes quicksight_hook and sts_hook doesn't use
  • airflow.airflow.providers.amazon.aws.operators.rds.RdsBaseSensor - set during sensor initialise
  • airflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.s3.S3KeySensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensor - set None during sensor initialise, init hook by get_hook
  • airflow.airflow.providers.amazon.aws.sensors.step_function.StepFunctionExecutionSensor - set None during sensor initialise, init hook by get_hook

region vs region_name attribute

AWSBaseHook expected region_name however some operator/sensors uses region.

For consistency better rename to region_name with mark region as deprecated field

List of components:

  • airflow.airflow.providers.amazon.aws.operators.eks.EksCreateClusterOperator
  • airflow.airflow.providers.amazon.aws.operators.eks.EksCreateNodegroupOperator
  • airflow.airflow.providers.amazon.aws.operators.eks.EksCreateFargateProfileOperator
  • airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteClusterOperator
  • airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteNodegroupOperator
  • airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteFargateProfileOperator
  • airflow.airflow.providers.amazon.aws.operators.eks.EksPodOperator
  • airflow.airflow.providers.amazon.aws.operators.redshift_data.RedshiftDataOperator
  • airflow.airflow.providers.amazon.aws.operators.quicksight.QuickSightCreateIngestionOperator
  • airflow.airflow.providers.amazon.aws.sensors.eks.EksClusterStateSensor
  • airflow.airflow.providers.amazon.aws.sensors.eks.EksFargateProfileStateSensor
  • airflow.airflow.providers.amazon.aws.sensors.eks.EksNodegroupStateSensor

No explicit set region_name

Some components use region_name from connection, and doesn't have parameter/argument region_name

Note: At that moment only glacier component, and some S3 operations non-regional, however even for this components better set region_name

List of components:

  • airflow.airflow.providers.amazon.aws.operators.athena.AthenaOperator
  • airflow.airflow.providers.amazon.aws.operators.aws_lambda.AwsLambdaInvokeFunctionOperator
  • airflow.airflow.providers.amazon.aws.operators.athena.CloudFormationCreateStackOperator
  • airflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperator
  • airflow.airflow.providers.amazon.aws.operators.dms.DmsCreateTaskOperator
  • airflow.airflow.providers.amazon.aws.operators.dms.DmsDescribeTasksOperator
  • airflow.airflow.providers.amazon.aws.operators.dms.DmsStartTaskOperator
  • airflow.airflow.providers.amazon.aws.operators.dms.DmsStopTaskOperator
  • airflow.airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator
  • airflow.airflow.providers.amazon.aws.operators.emr.EmrContainerOperator
  • airflow.airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator
  • airflow.airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator
  • airflow.airflow.providers.amazon.aws.operators.glue_crawler.GlueCrawlerOperator
  • airflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator - and all dependencies
  • airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftCreateClusterOperator
  • airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftResumeClusterOperator
  • airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftPauseClusterOperator
  • airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftDeleteClusterOperator
  • airflow.airflow.providers.amazon.aws.operators.redshift_sql.RedshiftSQLOperator
  • airflow.airflow.providers.amazon.aws.operators.s3.S3DeleteBucketOperator
  • airflow.airflow.providers.amazon.aws.operators.sagemaker.SageMakerBaseOperator - and all dependencies
  • airflow.airflow.providers.amazon.aws.operators.sns.SnsPublishOperator
  • airflow.airflow.providers.amazon.aws.operators.sqs.SqsPublishOperator
  • airflow.airflow.providers.amazon.aws.operators.sqs.StepFunctionStartExecutionOperator
  • airflow.airflow.providers.amazon.aws.operators.step_function.StepFunctionStartExecutionOperator
  • airflow.airflow.providers.amazon.aws.sensors.athena.AthenaSensor
  • airflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationCreateStackSensor - missing docstring
  • airflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationDeleteStackSensor - missing docstring
  • airflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensor
  • airflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor - and all dependencies
  • airflow.airflow.providers.amazon.aws.sensors.glue.GlacierJobOperationSensor
  • airflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensor
  • airflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensor
  • airflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensor
  • airflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor - and all dependencies
  • airflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensor
  • airflow.airflow.providers.amazon.aws.sensors.sqs.StepFunctionExecutionSensor
  • airflow.airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operator
  • airflow.airflow.providers.amazon.aws.transfers.hive_to_dynamodb.HiveToDynamoDBOperator
  • airflow.airflow.providers.amazon.aws.transfers.redshift_to_s3.RedshiftToS3Operator - redshift_region_name ???

What you think should happen instead

Try to make some generic stuff by the same way It may help to changes/contributions in the futures.

How to reproduce

No response

Anything else

I do not create PR just because in single PR it will affect almost all sensors/operators

IMHO, It is better to implement in parts

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:taskA task that needs to be completed as part of a larger issueprovider:amazonAWS/Amazon - related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions