Skip to content

DatabricksSQLOperator struggles with parsing the data #36838

@aru-trackunit

Description

@aru-trackunit

Apache Airflow Provider(s)

databricks

Versions of Apache Airflow Providers

The error has not been present in version apache-airflow-providers-databricks==4.7.0
I upgraded to the latest and it is presentapache-airflow-providers-databricks==6.0.0

Apache Airflow version

2.8.0

Operating System

Debian GNU/Linux 11 (bullseye)

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

[2024-01-16, 18:54:30 CET] {client.py:200} INFO - Successfully opened session XXXXXX-4c73-1765-b68b-b96c52b08745
[2024-01-16, 18:54:30 CET] {sql.py:450} INFO - Running statement: Select count(*) FROM catalog.schema.table_test parameters: None
[2024-01-16, 18:54:30 CET] {taskinstance.py:2699} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 433, in _execute_task
    result = execute_callable(context=context, **execute_callable_kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/common/sql/operators/sql.py", line 282, in execute
    output = hook.run(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/databricks/hooks/databricks_sql.py", line 256, in run
    result = self._make_common_data_structure(raw_result)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/databricks/hooks/databricks_sql.py", line 286, in _make_common_data_structure
    rows_object = namedtuple("Row", rows_fields)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/collections/__init__.py", line 373, in namedtuple
    raise ValueError('Type names and field names must be valid '
ValueError: Type names and field names must be valid identifiers: 'count(1)'

For the investigation reasons I edited the ./hooks/databricks_sql.py file and added some prints:
in the method _make_common_data_structure
result var is a type list and value [Row(count(1)=9714)]
row_fields value: <Row('count(1)')>

What you think should happen instead

No response

How to reproduce

test = DatabricksSqlOperator(
        task_id="count_query",
        databricks_conn_id="databricks-sp",
        sql_endpoint_name="endpoint_name",
        sql="SELECT count(*) FROM catalog.schema.table_test;"
    )

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions