Skip to content

BigQueryGetDataOperator's query job is bugged in deferrable mode #31432

@shahar1

Description

@shahar1

Apache Airflow version

main (development)

What happened

  1. When not providing project_id to BigQueryGetDataOperator in deferrable mode (project_id=None), the query generated by generate_query method is bugged, i.e.,:
from `None.DATASET.TABLE_ID` limit ...
  1. as_dict param does not work BigQueryGetDataOperator.

What you think should happen instead

  1. When project_id is None - it should be removed from the query along with the trailing dot, i.e.,:
from `DATASET.TABLE_ID` limit ...
  1. as_dict should be added to the serialization method of BigQueryGetDataTrigger.

How to reproduce

  1. Create a DAG file with BigQueryGetDataOperator defined as follows:
BigQueryGetDataOperator(
        task_id="bq_get_data_op",
        # project_id="PROJECT_ID",  <-- Not provided
        dataset_id="DATASET",
        table_id="TABLE",
        use_legacy_sql=False,
        deferrable=True
    )
    1. Create a DAG file with BigQueryGetDataOperator defined as follows:
BigQueryGetDataOperator(
        task_id="bq_get_data_op",
        project_id="PROJECT_ID",
        dataset_id="DATASET",
        table_id="TABLE",
        use_legacy_sql=False,
        deferrable=True,
        as_dict=True
    )

Operating System

Debian

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

The generate_query method is not unit tested (which would have prevented it in the first place) - will be better to add one.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions