Skip to content

GH-79: Move splitAndTransferValidityBuffer to BaseValueVector #777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 5, 2025

Conversation

rtadepalli
Copy link
Contributor

What's Changed

Move splitAndTransferValidityBuffer up to BaseValueVector.

This PR is not touching the implementation of this function in StructVector -- that is not being derived from BaseValueVector so some amount of duplication is probably fine.

Closes #79

This comment has been minimized.

* @param target target vector
*/
protected void splitAndTransferValidityBuffer(
int startIndex, int length, BaseValueVector target) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting BaseValueVector and not ValueVector here to keep the API (int, int, vector). If I accept ValueVector, then there there needs to be some lambda wrangling to pass in the vector-specific allocateValidityBuffer somehow, or that function needs to be added to ValueVector, both of which don't seem that great. From what I can tell everything except StructVector seems to be descend from BaseValueVector so chose this as part of the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked that UnionVector doesn't extend it either (which is good - as it doesn't have a validity buffer)

* @param length number of elements to be copied
* @param target target vector
*/
protected void sliceAndTransferValidityBuffer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is the one being used by the vectors inside the complex/ directory. BaseValueVector#validityBuffer is protected, so overridden implementations there can't call target.validityBuffer to set a new value. Vectors in the top level org/apache/arrow/vector directory can add their overridden implementations. If at some point some other vector in complex/ wants to have a different version of this function, then I am afraid that vector will need to override splitAndTransferValidityBuffer entirely. Presumably don't want to expose a public setValidityBuffer on BaseValueVector.

@lidavidm lidavidm added the enhancement PRs that add or improve features. label Jun 2, 2025
@github-actions github-actions bot added this to the 18.4.0 milestone Jun 2, 2025
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

* @param target target vector
*/
protected void splitAndTransferValidityBuffer(
int startIndex, int length, BaseValueVector target) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked that UnionVector doesn't extend it either (which is good - as it doesn't have a validity buffer)

*
* @param byteSizeTarget desired size of the buffer
*/
protected void allocateValidityBuffer(final long byteSizeTarget) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an abstract class - mark this abstract?

Comment on lines +867 to +868
final ArrowBuf slicedValidityBuffer = validityBuffer.slice(firstByteSource, byteSizeTarget);
target.validityBuffer = transferBuffer(slicedValidityBuffer, target.allocator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this only slightly differs from the 'base' implementation. However I think this one is more correct...I didn't dig too deep but it's possible the 'simple' one that just retains the buffer was a later/incorrect reimplementation or because they were all copy-pasted at some point only some types were adjusted to use transferBuffer

@@ -342,7 +340,8 @@ private void allocateBytes(int valueCount) {
* slice the source buffer so we have to explicitly allocate the validityBuffer of the target
* vector. This is unlike the databuffer which we can always slice for the target vector.
*/
private void allocateValidityBuffer(final int validityBufferSize) {
@Override
protected void allocateValidityBuffer(final long validityBufferSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given many of these are very similar maybe they should delegate to a base implementation with super.allocateValidityBuffer then call any vector-specific logic at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, there is a slight behavior change though -- the validity buffer in BaseFixedWidthVector was not zeroing out its elements post allocation, it is now. Should be fine though I am thinking.

@rtadepalli rtadepalli requested a review from lidavidm June 3, 2025 00:11
@rtadepalli
Copy link
Contributor Author

Resolved all conflicts.

@lidavidm lidavidm merged commit acbc138 into apache:main Jun 5, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement PRs that add or improve features.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java] Create a utility function for validity buffer based split and transfer usage in Vector module
2 participants