-
Notifications
You must be signed in to change notification settings - Fork 69
GH-79: Move splitAndTransferValidityBuffer
to BaseValueVector
#777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
* @param target target vector | ||
*/ | ||
protected void splitAndTransferValidityBuffer( | ||
int startIndex, int length, BaseValueVector target) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepting BaseValueVector
and not ValueVector
here to keep the API (int, int, vector)
. If I accept ValueVector
, then there there needs to be some lambda wrangling to pass in the vector-specific allocateValidityBuffer
somehow, or that function needs to be added to ValueVector
, both of which don't seem that great. From what I can tell everything except StructVector
seems to be descend from BaseValueVector
so chose this as part of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked that UnionVector doesn't extend it either (which is good - as it doesn't have a validity buffer)
* @param length number of elements to be copied | ||
* @param target target vector | ||
*/ | ||
protected void sliceAndTransferValidityBuffer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is the one being used by the vectors inside the complex/
directory. BaseValueVector#validityBuffer
is protected, so overridden implementations there can't call target.validityBuffer
to set a new value. Vectors in the top level org/apache/arrow/vector
directory can add their overridden implementations. If at some point some other vector in complex/
wants to have a different version of this function, then I am afraid that vector will need to override splitAndTransferValidityBuffer
entirely. Presumably don't want to expose a public setValidityBuffer
on BaseValueVector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* @param target target vector | ||
*/ | ||
protected void splitAndTransferValidityBuffer( | ||
int startIndex, int length, BaseValueVector target) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked that UnionVector doesn't extend it either (which is good - as it doesn't have a validity buffer)
* | ||
* @param byteSizeTarget desired size of the buffer | ||
*/ | ||
protected void allocateValidityBuffer(final long byteSizeTarget) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an abstract class - mark this abstract?
final ArrowBuf slicedValidityBuffer = validityBuffer.slice(firstByteSource, byteSizeTarget); | ||
target.validityBuffer = transferBuffer(slicedValidityBuffer, target.allocator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this only slightly differs from the 'base' implementation. However I think this one is more correct...I didn't dig too deep but it's possible the 'simple' one that just retains the buffer was a later/incorrect reimplementation or because they were all copy-pasted at some point only some types were adjusted to use transferBuffer
@@ -342,7 +340,8 @@ private void allocateBytes(int valueCount) { | |||
* slice the source buffer so we have to explicitly allocate the validityBuffer of the target | |||
* vector. This is unlike the databuffer which we can always slice for the target vector. | |||
*/ | |||
private void allocateValidityBuffer(final int validityBufferSize) { | |||
@Override | |||
protected void allocateValidityBuffer(final long validityBufferSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given many of these are very similar maybe they should delegate to a base implementation with super.allocateValidityBuffer
then call any vector-specific logic at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, there is a slight behavior change though -- the validity buffer in BaseFixedWidthVector
was not zeroing out its elements post allocation, it is now. Should be fine though I am thinking.
Resolved all conflicts. |
What's Changed
Move
splitAndTransferValidityBuffer
up toBaseValueVector
.This PR is not touching the implementation of this function in
StructVector
-- that is not being derived fromBaseValueVector
so some amount of duplication is probably fine.Closes #79