Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Add parameter of excluding non-current fields in Glue #12664

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

duoxoud
Copy link

@duoxoud duoxoud commented Mar 27, 2025

(Reopen #11334)
Closes #7584

This PR addresses a feature request for improving the Glue Schema generation process. It introduces a new configuration option that allows users to exclude non-current fields from the Glue Schema, providing clarity and reducing confusion for Athena users who primarily query current data.

In PR #3888, the Glue schema generation was modified to include all historical fields. This was intended to help users recognize previously used columns and avoid duplicating column names. However, in practice, this approach has led to confusion among users (for example, the same issue explained in #7584 ).

The current behaviour remains unchanged.
(introduced GLUE_NON_CURRENT_FIELDS_DISABLED_DEFAULT = false to keep the current behaviour)

@github-actions github-actions bot added the AWS label Mar 27, 2025
@duoxoud duoxoud force-pushed the option-to-disable-non-current-fields-in-glue branch from e216cf9 to c65ca7f Compare March 27, 2025 11:00
@duoxoud duoxoud force-pushed the option-to-disable-non-current-fields-in-glue branch from c65ca7f to 1b83550 Compare March 27, 2025 11:11
@duoxoud duoxoud marked this pull request as ready for review March 27, 2025 11:14
@nastra nastra requested a review from jackye1995 March 27, 2025 11:49
@duoxoud duoxoud changed the title [AWS] Add parameter of excluding non-current fields in Glue AWS: Add parameter of excluding non-current fields in Glue Mar 27, 2025
@borjagonzal
Copy link

This change fixes an issue we have faced during some time in our stack.
Thank you for pushing this one @duoxoud!

@xiaoxuandev
Copy link
Contributor

Displaying non-current columns is intentional in Glue, as users may use LakeFormation and need to access dropped columns. Users should not rely on Glue for the latest table status, Iceberg metadata should always be considered the source of truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS: provide option to hide old fields in Glue table
3 participants