Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Oct 17, 2025

Historical background / Problem

There was an issue a while ago where one couldn't drop a partition field and then remove it from a table's schema, as that would always result in an error. This was fixed by #11868, where the field's type was replaced with UnknownType and thus ignored during reads. However, if a user performs a DELETE operation after dropping a partition field and then removing it from a table's schema, the DELETE operation fails in a few places, because those places don't deal with the UnknownType.

Approach

This PR updates all of the places that were failing during the DELETE operation to properly deal with the UnknownType. I'm currently unsure if that's the right approach to deal with this issue, so looking for some feedback/ideas on that. I've also added some tests that reproduce the original issue.

fixes #14343

@nastra nastra marked this pull request as draft October 17, 2025 06:57
@nastra nastra requested a review from Fokko October 17, 2025 06:57
@nastra nastra force-pushed the unknown-type-partition branch from 5d717a1 to 5ab73bf Compare October 17, 2025 07:30
classes[i] = result.typeId().javaClass();
if (null == sourceType) {
// When the source field has been dropped we cannot determine the type
classes[i] = Types.UnknownType.get().typeId().javaClass();
Copy link
Contributor Author

@nastra nastra Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handling it similar to how it's done in

// When the source field has been dropped we cannot determine the type
if (sourceType == null) {
resultType = Types.UnknownType.get();
}

which was added by 8134815

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should default sourceType to UnknownType.get() and still call getResultType. The reason is that for many transforms, the result type is known. For example, bucket always produces IntegerType.get(). That will result in fewer cases where we have an unknown and use Void.class from this.

case DECIMAL:
return ByteBuffer.wrap(((BigDecimal) value).unscaledValue().toByteArray());
case UNKNOWN:
// underlying type not known
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lower/upper bound is converted in

this.lowerBound = Conversions.fromByteBuffer(primitive, summary.lowerBound());
this.upperBound = Conversions.fromByteBuffer(primitive, summary.upperBound());
but we can't convert it to its underlying type, since the type isn't known

return (ByteBuffer) value;
case DECIMAL:
return ByteBuffer.wrap(((BigDecimal) value).unscaledValue().toByteArray());
case UNKNOWN:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lower/upper bound is converted to a byte buffer in

min != null ? Conversions.toByteBuffer(type, min) : null,
max != null ? Conversions.toByteBuffer(type, max) : null);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

byte[] unscaledBytes = new byte[buffer.remaining()];
tmp.get(unscaledBytes);
return new BigDecimal(new BigInteger(unscaledBytes), decimal.scale());
case UNKNOWN:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lower/upper bound is converted from a byte buffer in

this.lowerBound = Conversions.fromByteBuffer(primitive, summary.lowerBound());
this.upperBound = Conversions.fromByteBuffer(primitive, summary.upperBound());
but we can't convert it to its underlying type, since the type isn't known

Object datum = get(row, i);
ValueWriter<Object> writer = writers[i];

if (NullWriter.INSTANCE.getClass().equals(writer.getClass()) && null != datum) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise this fails with a CCE since datum can be a String, which is then casted to the NullWriter Void

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, what about just changing the type parameter of NullWriter to Object? The argument is ignored anyway so that avoids the CCE and we don't have to special-case it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be ideal, but that breaks the API, because ValueWriters.nulls() is public. We could add a deprecation to that method and mention that the return type will be changed to ValueWriters<Object> in the next release, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get around this by casting to a raw unparameterized type:

  private static class NullWriter implements ValueWriter<Object> {
    @SuppressWarnings({"unchecked", "rawtypes"})
    private static final ValueWriter<Void> INSTANCE = (ValueWriter) new NullWriter();

    private NullWriter() {}

    @Override
    public void write(Object ignored, Encoder encoder) throws IOException {
      encoder.writeNull();
    }
  }

The resulting ValueWriter<Void> is more restrictive than ValueWriter<Object> so it is safe and you don't have to change the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, that would also work, thanks

@nastra nastra added this to the Iceberg 1.10.1 milestone Oct 29, 2025
@nastra nastra force-pushed the unknown-type-partition branch from 5ab73bf to 3fe4ee2 Compare October 30, 2025 10:58
@nastra nastra marked this pull request as ready for review October 30, 2025 10:58
}

boolean canContain(Object value) {
if (Types.UnknownType.get().equals(type)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved after the null and nan value checks? The null and nan counts are still valid, right?

This should also have a test update.

@nastra nastra force-pushed the unknown-type-partition branch from 3fe4ee2 to d2441c5 Compare November 6, 2025 08:54
Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once the null writer issue is fixed.

@nastra
Copy link
Contributor Author

nastra commented Nov 10, 2025

thanks for reviewing @rdblue

@nastra nastra merged commit c22bac8 into apache:main Nov 10, 2025
43 checks passed
@nastra nastra deleted the unknown-type-partition branch November 10, 2025 08:57
nastra added a commit to nastra/iceberg that referenced this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Removing partitionField and dropping partition columns will cause operations with delete files to fail.

2 participants