Disallow enabling column mapping if invalid column mapping metadata is already present #4167
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
As effect of earlier bugs (e.g. fixed in #3487) there can exists tables where column mapping is disabled, but there is column mapping metadata on the table. Enabling column mapping metadata on such a table could lead to unexpected corruption. Simply stripping such metadata could also lead to curruptions, as the invalid metadata can be already used in other places (e.g. column statistics) via DeltaColumnMapping.getPhysicalName, which returns the name from the metadata even when column mapping is disabled.
After #3688 it should no longer be possible to end up with tables having such invalid metadata, so the issue only concerns existing tables created before that fix.
To avoid corruption, we want to disallow enabling column mapping on such tables.
How was this patch tested?
Added tests to DeltaColumnMappingSuite.
Does this PR introduce any user-facing changes?
No.
We are disallowing an operation on tables that would lead to Delta table corruption on tables that are already in an invalid state entering which is fixed already, so it can only concern old tables in the wild.