-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Make PositionDeleteIndex serializable #11463
Core: Make PositionDeleteIndex serializable #11463
Conversation
* magic bytes and bitmap for compatibility with Delta. | ||
*/ | ||
@Override | ||
public ByteBuffer serialize() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Puffin writers currently assume we will create and pass ByteBuffer
. We also compute a checksum on serialized magic bytes and bitmap, so I am allocating a byte array upfront.
@@ -43,6 +53,11 @@ class BitmapPositionDeleteIndex implements PositionDeleteIndex { | |||
this.deleteFiles = deleteFile != null ? Lists.newArrayList(deleteFile) : Lists.newArrayList(); | |||
} | |||
|
|||
BitmapPositionDeleteIndex(RoaringPositionBitmap bitmap, DeleteFile deleteFile) { | |||
this.bitmap = bitmap; | |||
this.deleteFiles = deleteFile != null ? Lists.newArrayList(deleteFile) : Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where we can have a deserialized bitmap, but no delete file? Just wondering if we should have a separate constructor or if it makes sense to allow passing null for those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I have this check here to match the constructor above. There are some legacy code paths that create an index and pass null as DeleteFile
. Given that it is a new constructor, we can probably require the file to be non-null, but it won't be consistent with the constructor above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
*/ | ||
@Override | ||
public ByteBuffer serialize() { | ||
bitmap.runLengthEncode(); // run-length encode the bitmap before serializing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we more eagerly run length encode the bitmap in the constructor instead of at serialization time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM, I don't think it'll make any difference and probably more readable the way it's written at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually want to accumulate all values prior to applying run-length encoding, so doing this right before the serialization seems like a good place. We don't anticipate the bitmap to change after serialization.
long cardinality = bitmap.cardinality(); | ||
long expectedCardinality = deleteFile.recordCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIt: I feel like we could inline bitMap.cardinality() == deleteFile.recordCount()
in the check below and the intent of the check would still be clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this for consistency with the method above. It shouldn't be expensive to call recordCount()
twice, so I am OK changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me actually keep it to avoid one more round of CI.
Thanks for reviewing, @danielcweeks @amogh-jahagirdar! |
This PR makes
PositionDeleteIndex
serializable as per the V3 spec.This work is part of #11122.