Core, Spark: Handle unknown type during deletes #14356

nastra · 2025-10-17T06:56:56Z

Historical background / Problem

There was an issue a while ago where one couldn't drop a partition field and then remove it from a table's schema, as that would always result in an error. This was fixed by #11868, where the field's type was replaced with UnknownType and thus ignored during reads. However, if a user performs a DELETE operation after dropping a partition field and then removing it from a table's schema, the DELETE operation fails in a few places, because those places don't deal with the UnknownType.

Approach

This PR updates all of the places that were failing during the DELETE operation to properly deal with the UnknownType. I'm currently unsure if that's the right approach to deal with this issue, so looking for some feedback/ideas on that. I've also added some tests that reproduce the original issue.

fixes #14343

nastra · 2025-10-17T07:31:51Z

api/src/main/java/org/apache/iceberg/PartitionSpec.java

-              classes[i] = result.typeId().javaClass();
+              if (null == sourceType) {
+                // When the source field has been dropped we cannot determine the type
+                classes[i] = Types.UnknownType.get().typeId().javaClass();


handling it similar to how it's done in

iceberg/api/src/main/java/org/apache/iceberg/PartitionSpec.java

Lines 136 to 139 in 5ab73bf

// When the source field has been dropped we cannot determine the type

if (sourceType == null) {

resultType = Types.UnknownType.get();

}

which was added by 8134815

I think this should default sourceType to UnknownType.get() and still call getResultType. The reason is that for many transforms, the result type is known. For example, bucket always produces IntegerType.get(). That will result in fewer cases where we have an unknown and use Void.class from this.

nastra · 2025-10-17T07:33:35Z

api/src/main/java/org/apache/iceberg/types/Conversions.java

      case DECIMAL:
        return ByteBuffer.wrap(((BigDecimal) value).unscaledValue().toByteArray());
+      case UNKNOWN:
+        // underlying type not known


the lower/upper bound is converted in

iceberg/core/src/main/java/org/apache/iceberg/util/ManifestFileUtil.java

Lines 51 to 52 in 5ab73bf

this.lowerBound = Conversions.fromByteBuffer(primitive, summary.lowerBound());

this.upperBound = Conversions.fromByteBuffer(primitive, summary.upperBound());

but we can't convert it to its underlying type, since the type isn't known

nastra · 2025-10-17T07:36:03Z

api/src/main/java/org/apache/iceberg/types/Conversions.java

        return (ByteBuffer) value;
      case DECIMAL:
        return ByteBuffer.wrap(((BigDecimal) value).unscaledValue().toByteArray());
+      case UNKNOWN:


the lower/upper bound is converted to a byte buffer in

iceberg/core/src/main/java/org/apache/iceberg/PartitionSummary.java

Lines 80 to 81 in c07f2aa

min != null ? Conversions.toByteBuffer(type, min) : null,

max != null ? Conversions.toByteBuffer(type, max) : null);

Seems reasonable to me.

nastra · 2025-10-17T07:36:33Z

api/src/main/java/org/apache/iceberg/types/Conversions.java

        byte[] unscaledBytes = new byte[buffer.remaining()];
        tmp.get(unscaledBytes);
        return new BigDecimal(new BigInteger(unscaledBytes), decimal.scale());
+      case UNKNOWN:


the lower/upper bound is converted from a byte buffer in

iceberg/core/src/main/java/org/apache/iceberg/util/ManifestFileUtil.java

Lines 51 to 52 in 5ab73bf

this.lowerBound = Conversions.fromByteBuffer(primitive, summary.lowerBound());

this.upperBound = Conversions.fromByteBuffer(primitive, summary.upperBound());

but we can't convert it to its underlying type, since the type isn't known

nastra · 2025-10-17T07:37:21Z

core/src/main/java/org/apache/iceberg/avro/ValueWriters.java

+        Object datum = get(row, i);
+        ValueWriter<Object> writer = writers[i];
+
+        if (NullWriter.INSTANCE.getClass().equals(writer.getClass()) && null != datum) {


otherwise this fails with a CCE since datum can be a String, which is then casted to the NullWriter Void

Instead, what about just changing the type parameter of NullWriter to Object? The argument is ignored anyway so that avoids the CCE and we don't have to special-case it here.

that would be ideal, but that breaks the API, because ValueWriters.nulls() is public. We could add a deprecation to that method and mention that the return type will be changed to ValueWriters<Object> in the next release, wdyt?

You can get around this by casting to a raw unparameterized type:

private static class NullWriter implements ValueWriter<Object> { @SuppressWarnings({"unchecked", "rawtypes"}) private static final ValueWriter<Void> INSTANCE = (ValueWriter) new NullWriter(); private NullWriter() {} @Override public void write(Object ignored, Encoder encoder) throws IOException { encoder.writeNull(); } }

The resulting ValueWriter<Void> is more restrictive than ValueWriter<Object> so it is safe and you don't have to change the API.

nice, that would also work, thanks

...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java

api/src/main/java/org/apache/iceberg/types/Comparators.java

rdblue · 2025-11-05T22:26:45Z

core/src/main/java/org/apache/iceberg/util/ManifestFileUtil.java

    }

    boolean canContain(Object value) {
+      if (Types.UnknownType.get().equals(type)) {


Should this be moved after the null and nan value checks? The null and nan counts are still valid, right?

This should also have a test update.

rdblue

Looks good once the null writer issue is fixed.

nastra · 2025-11-10T08:56:52Z

thanks for reviewing @rdblue

nastra marked this pull request as draft October 17, 2025 06:57

github-actions bot added API spark core labels Oct 17, 2025

nastra requested a review from Fokko October 17, 2025 06:57

nastra force-pushed the unknown-type-partition branch from 5d717a1 to 5ab73bf Compare October 17, 2025 07:30

nastra commented Oct 17, 2025

View reviewed changes

...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java Outdated Show resolved Hide resolved

nastra requested review from amogh-jahagirdar, danielcweeks and rdblue October 23, 2025 06:56

nastra added this to the Iceberg 1.10.1 milestone Oct 29, 2025

amogh-jahagirdar requested a review from huaxingao October 29, 2025 20:44

nastra force-pushed the unknown-type-partition branch from 5ab73bf to 3fe4ee2 Compare October 30, 2025 10:58

nastra marked this pull request as ready for review October 30, 2025 10:58

rdblue reviewed Nov 5, 2025

View reviewed changes

api/src/main/java/org/apache/iceberg/types/Comparators.java Outdated Show resolved Hide resolved

rdblue reviewed Nov 5, 2025

View reviewed changes

nastra added 2 commits November 6, 2025 07:25

Core, Spark: Handle unknown type during deletes

f2fae88

review feedback

d2441c5

nastra force-pushed the unknown-type-partition branch from 3fe4ee2 to d2441c5 Compare November 6, 2025 08:54

rdblue approved these changes Nov 7, 2025

View reviewed changes

review feedback

d0121ab

nastra merged commit c22bac8 into apache:main Nov 10, 2025
43 checks passed

nastra deleted the unknown-type-partition branch November 10, 2025 08:57

nastra added a commit to nastra/iceberg that referenced this pull request Nov 10, 2025

Core, Spark: Handle unknown type during deletes (apache#14356)

e95031b

nastra mentioned this pull request Nov 10, 2025

[1.10.x] Core, Spark: Handle unknown type during deletes (#14356) #14548

Merged

nastra added a commit that referenced this pull request Nov 10, 2025

Core, Spark: Handle unknown type during deletes (#14356) (#14548)

a966f49

	// When the source field has been dropped we cannot determine the type
	if (sourceType == null) {
	resultType = Types.UnknownType.get();
	}

	this.lowerBound = Conversions.fromByteBuffer(primitive, summary.lowerBound());
	this.upperBound = Conversions.fromByteBuffer(primitive, summary.upperBound());

	min != null ? Conversions.toByteBuffer(type, min) : null,
	max != null ? Conversions.toByteBuffer(type, max) : null);

Core, Spark: Handle unknown type during deletes #14356

Core, Spark: Handle unknown type during deletes #14356

Uh oh!

Conversation

nastra commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Historical background / Problem

Approach

Uh oh!

nastra Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue left a comment

Choose a reason for hiding this comment

Uh oh!

nastra commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nastra commented Oct 17, 2025 •

edited

Loading

nastra Oct 17, 2025 •

edited

Loading