Skip to content

BigQueryInsertJobOperator fails for task IDs with 64 characters #39567

@kisssam

Description

@kisssam

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==10.17.0

Apache Airflow version

airflow-2.7.3

Operating System

Running on Google Cloud Composer

Deployment

Google Cloud Composer

Deployment details

apache-airflow-providers-google==10.17.0

What happened

When a task using BigQueryInsertJobOperator has exactly 64 characters in its task_id, the task fails with the following error:

[2024-05-10TXX:XX:XX.XXX+0000] {standard_task_runner.py:104} ERROR - Failed to execute job XXXXXXXX for task task_id_with_exactly_64_characters_00000000000000000000000000000 (400 POST https://bigquery.googleapis.com/bigquery/v2/projects/<PROJECT_ID>/jobs?prettyPrint=false: Label value "task_id_with_exactly_64_characters_00000000000000000000000000000" has invalid characters.

when the provider package apache-airflow-providers-google is of version 10.17.0.

What you think should happen instead

If the task_id does not follow the conditions for BQ label values, i.e., Values can be empty, and have a maximum length of 63 characters and can contain only lowercase letters, numeric characters, underscores, and dashes - then the BigQuery job should still get created successfully without the default labels not being added , instead of failing as currently observed in case of task_id with 64 characters.

How to reproduce

  • Create a Airflow environment with apache-airflow-providers-google==10.17.0 .

  • Create a task with the task_id as "task_id_with_exactly_64_characters_00000000000000000000000000000" using the BigQueryInsertJobOperator to create any BQ query job.

  • Observe that the job fails with the error Label value "task_id_with_exactly_64_characters_00000000000000000000000000000" has invalid characters.

Anything else

This is occurring as a result of the validation introduced in #37736.

#37736 automatically sets the airflow-dag and airflow-task as job labels for the BigQuery job created as long as these identifiers follow the regex pattern LABEL_REGEX = re.compile(r"^[a-z][\w-]{0,63}$") - which means that the task_id name regex matches a pattern starting with a lowercase letter and has a maximum length of 64 characters and contain only alphanumeric characters, underscores, or hyphens.
Otherwise, the BigQueryInsertJobOperator will create a job without adding any default labels (for example, in the case of task_id greater than 64 characters).

However, as per the BigQuery documentation for labels, Values can be empty, and have a maximum length of 63 characters and can contain only lowercase letters, numeric characters, underscores, and dashes.

Hence, the current validation regex LABEL_REGEX does not satisfy the conditions for BigQuery label values.

For the edge case of a task_id with 64 characters, this passes the validation in LABEL_REGEX but because BigQuery label values only support upto 63 characters, the BigQuery job creation fails.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:providerskind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions