Skip to content

Conversation

@chikamura
Copy link
Contributor

Summary

  • Fixed an issue where retaining policy_tags with access controls caused errors in replace mode
  • Refactored retention logic to update table after creation instead of during schema building
  • Ensured column_options descriptions take precedence over retained descriptions

Background

When retain_column_policy_tags was enabled in replace mode and the existing table had policy_tags with access
controls, the operation would fail. This was because trying to set policy_tags with access controls during table
creation is not allowed by BigQuery.

Changes

  • Moved the retention logic from buildSchema to a post-creation update process
  • Added updateTableIfNeed method to update table schema after creation
  • Added storeCachedSrcFieldsIfNeed to temporarily store source fields
  • Added comprehensive unit tests for the new retention logic

…lace mode

Refactored the retention logic for descriptions and policy_tags to handle
the case where policy_tags have access controls, which was causing errors
when retain_column_policy_tags was enabled.

- Moved retention logic from buildSchema to post-table-creation update
- buildSchema now only defines field structure, temporarily storing retained info
- Added updateTableIfNeed to update table after creation when needed
- column_options description takes precedence over retained description

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@chikamura chikamura self-assigned this Nov 4, 2025
@chikamura chikamura requested a review from NamedPython November 4, 2025 05:00
Comment on lines +238 to +242
bigquery.update(
TableInfo.newBuilder(
table.getTableId(),
StandardTableDefinition.newBuilder().setSchema(patchSchema).build())
.build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have an error handling for bigquery.update ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NamedPython
Thank you for the review.
Could you specify the type of exception handling you’re considering?
I believe it would fall under one of the following cases — could you please confirm?

  • Output an error log and throw a BigqueryException.

    logger.error(
    String.format("embulk-out_bigquery: insert_table(%s:%s.%s)", project, dataset, table));
    throw new BigqueryException(
    String.format(
    "failed to create table %s:%s.%s, response: %s", project, dataset, table, e));
    }

  • Some API errors do not affect the overall process, so we will ignore those errors in such cases.

  • Output a warning log and continue the process.

  • No change needed — just let the error thrown by bigquery.update propagate as is.
    For reference, in the cases of getDataset, createDataset, and deleteTableOrPartition, neither the caller nor the callee performs any specific exception handling.

Copy link
Contributor

@NamedPython NamedPython left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one comment 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants