Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow enabling column mapping if invalid column mapping metadata is already present #4167

Conversation

juliuszsompolski
Copy link
Contributor

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

As effect of earlier bugs (e.g. fixed in #3487) there can exists tables where column mapping is disabled, but there is column mapping metadata on the table. Enabling column mapping metadata on such a table could lead to unexpected corruption. Simply stripping such metadata could also lead to curruptions, as the invalid metadata can be already used in other places (e.g. column statistics) via DeltaColumnMapping.getPhysicalName, which returns the name from the metadata even when column mapping is disabled.

After #3688 it should no longer be possible to end up with tables having such invalid metadata, so the issue only concerns existing tables created before that fix.

To avoid corruption, we want to disallow enabling column mapping on such tables.

How was this patch tested?

Added tests to DeltaColumnMappingSuite.

Does this PR introduce any user-facing changes?

No.
We are disallowing an operation on tables that would lead to Delta table corruption on tables that are already in an invalid state entering which is fixed already, so it can only concern old tables in the wild.

@juliuszsompolski
Copy link
Contributor Author

@cstavr

Copy link
Contributor

@cstavr cstavr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! LGTM

@scottsand-db scottsand-db merged commit 50a99fd into delta-io:master Feb 19, 2025
18 of 21 checks passed
anoopj pushed a commit to anoopj/delta that referenced this pull request Feb 24, 2025
…s already present (delta-io#4167)

#### Which Delta project/connector is this regarding?

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

As effect of earlier bugs (e.g. fixed in
delta-io#3487) there can exists tables
where column mapping is disabled, but there is column mapping metadata
on the table. Enabling column mapping metadata on such a table could
lead to unexpected corruption. Simply stripping such metadata could also
lead to curruptions, as the invalid metadata can be already used in
other places (e.g. column statistics) via
DeltaColumnMapping.getPhysicalName, which returns the name from the
metadata even when column mapping is disabled.

After delta-io#3688 it should no longer be
possible to end up with tables having such invalid metadata, so the
issue only concerns existing tables created before that fix.

To avoid corruption, we want to disallow enabling column mapping on such
tables.

## How was this patch tested?

Added tests to DeltaColumnMappingSuite.

## Does this PR introduce _any_ user-facing changes?

No.
We are disallowing an operation on tables that would lead to Delta table
corruption on tables that are already in an invalid state entering which
is fixed already, so it can only concern old tables in the wild.

---------

Co-authored-by: Julek Sompolski <Juliusz Sompolski>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants