-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
apache-airflow-providers-google==10.17.0
Apache Airflow version
airflow-2.7.3
Operating System
Running on Google Cloud Composer
Deployment
Google Cloud Composer
Deployment details
apache-airflow-providers-google==10.17.0
What happened
When a task using BigQueryInsertJobOperator has exactly 64 characters in its task_id, the task fails with the following error:
[2024-05-10TXX:XX:XX.XXX+0000] {standard_task_runner.py:104} ERROR - Failed to execute job XXXXXXXX for task task_id_with_exactly_64_characters_00000000000000000000000000000 (400 POST https://bigquery.googleapis.com/bigquery/v2/projects/<PROJECT_ID>/jobs?prettyPrint=false: Label value "task_id_with_exactly_64_characters_00000000000000000000000000000" has invalid characters.
when the provider package apache-airflow-providers-google is of version 10.17.0.
What you think should happen instead
If the task_id does not follow the conditions for BQ label values, i.e., Values can be empty, and have a maximum length of 63 characters and can contain only lowercase letters, numeric characters, underscores, and dashes - then the BigQuery job should still get created successfully without the default labels not being added , instead of failing as currently observed in case of task_id with 64 characters.
How to reproduce
-
Create a Airflow environment with apache-airflow-providers-google==10.17.0 .
-
Create a task with the task_id as "task_id_with_exactly_64_characters_00000000000000000000000000000" using the BigQueryInsertJobOperator to create any BQ query job.
-
Observe that the job fails with the error
Label value "task_id_with_exactly_64_characters_00000000000000000000000000000" has invalid characters.
Anything else
This is occurring as a result of the validation introduced in #37736.
#37736 automatically sets the airflow-dag and airflow-task as job labels for the BigQuery job created as long as these identifiers follow the regex pattern LABEL_REGEX = re.compile(r"^[a-z][\w-]{0,63}$") - which means that the task_id name regex matches a pattern starting with a lowercase letter and has a maximum length of 64 characters and contain only alphanumeric characters, underscores, or hyphens.
Otherwise, the BigQueryInsertJobOperator will create a job without adding any default labels (for example, in the case of task_id greater than 64 characters).
However, as per the BigQuery documentation for labels, Values can be empty, and have a maximum length of 63 characters and can contain only lowercase letters, numeric characters, underscores, and dashes.
Hence, the current validation regex LABEL_REGEX does not satisfy the conditions for BigQuery label values.
For the edge case of a task_id with 64 characters, this passes the validation in LABEL_REGEX but because BigQuery label values only support upto 63 characters, the BigQuery job creation fails.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct