Skip to content

Hook parameters are not managed properly in AthenaSQLHook with SQLValueCheckOperator #55678

@FrancisLfg

Description

@FrancisLfg

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==9.11.0
apache-airflow-providers-common-sql==1.27.0

Apache Airflow version

2.10.5

Operating System

Debian GNU/Linux 12 (bookworm)

Deployment

Other 3rd-party Helm chart

Deployment details

No response

What happened

The SQLValueCheckOperator doesn't work as expected with the AthenaSQLHook, and throws an error:

TypeError: AwsGenericHook.__init__() got an unexpected keyword argument 's3_staging_dir'

The s3_staging_dir/work_group is actually a mandatory parameter for Athena Connection. But since the addition of extra_dejson in hook_params change from this PR, the hook constructor is failing.
It is not able to manage mandatory hook_params.

What you think should happen instead

The AthenaSQLHook should manage properly the hook_params since they are passed automatically through this constructor.

How to reproduce

def test_athena_hook_fail():
    """Test to reproduce the Athena hook issue with s3_staging_dir parameter."""

    from airflow.models.connection import Connection

    # Mock Athena connection with s3_staging_dir in extra
    athena_conn = Connection(
        conn_id="athena_conn",
        conn_type="athena",
        description="Connection to a Athena API",
        schema="athena_sql_schema1",
        extra={"s3_staging_dir": "s3://mybucket/athena/", "region_name": "eu-west-1"},
    )

    with patch("airflow.hooks.base.BaseHook.get_connection", return_value=athena_conn):
        # This should reproduce the TypeError: AwsGenericHook.__init__() got an unexpected keyword argument 's3_staging_dir'
        operator = SQLValueCheckOperator(
            task_id="value_check", sql="SELECT TRUE", pass_value=True, conn_id="athena_conn"
        )

        context = {"ds": "2024-01-01", "execution_date": None}
        operator.execute(context)

Anything else

hook_params lacks documentation, so I am unsure how to use it. Should it be managed in the hook constructor ? If yes, we need to change the signature ?

Also, is it really needed to feed hook_params with extra_dejson as it is done here ?

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions