-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Provide option to force_delete for GCSToBigQueryOperator
#43785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ion table if it already exists.
6304017 to
b9771ef
Compare
shahar1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not strongly against, but why not using the existing the operator for that? (I'm questioning the atomicity of transfer operators in general)
providers/src/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py
Outdated
Show resolved
Hide resolved
It's a fair question. The thought process came about because of a scenario I have encountered revolving around the use of BigQuery dataset expiration policies that will automatically drop tables after a specified amount of time, e.g. 7 days, which we do for temporary/staging areas. Now, suppose I use the On that seventh day, if everything all runs at the same time, then the table will not have expired yet so the GCSToBQ task will succeed but not recreate the table. However, in the few seconds between this task ending and the downstream task starting it will be deleted resulting in a task failure due to the table not existing. The current solution to this is to add a prior task using |
…bigquery.py Co-authored-by: Shahar Epstein <60007259+shahar1@users.noreply.github.com>
Sounds fine by me, I'd be happy for additional feedback before merging. |
|
LGTM |
…43785) * Adding a parameter to provide the option to force delete the destination table if it already exists. * Adding a test for force_delete * Update providers/src/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py Co-authored-by: Shahar Epstein <60007259+shahar1@users.noreply.github.com> --------- Co-authored-by: Shahar Epstein <60007259+shahar1@users.noreply.github.com>
When loading data into a BigQuery table, although there are options to create or truncate the destination table using
CREATE/WRITE_DISPOSITION, it might also be desirable to recreate the table as part of the task but this is not currently possible and requires a separate task usingBigQueryDeleteTableOperator.Adding a
force_deleteparameter that simply calls the BigQuery hook'sdelete_tablefunction would enable this.