Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface #40043
GH-40038: [Java] Export non empty offset buffer for variable-size layout through C Data Interface #40043
Changes from all commits
5eb34e1
c9c0522
5d0475f
47ddac2
0d004c4
aae9bcd
7f4343c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent of this change to avoid changing the existing offset buffer, and only change it for the exported version? Then I must ask
(1) why? If we're allocating the buffer, we may as well use it for the vector as well.
(2) is this safe in terms of leaking memory? since apparently nothing will hold on to the buffer after allocating it above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2, this is only for exporting the vectors through C Data Interface. By its means, it is intended to be released by the importer side. At Java Arrow, I remember ArrayExporter only increases the reference count for the buffers to be exported to prevent them to be released automatically.
And the release callback of exported structure will reduce the reference count to release it eventually. But it is triggered at imported side. I think this is what the C Data Interface defines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1, they are allocated for using only at imported side. I think as per discussed with @pitrou above, the idea is these empty buffers exist internally and it is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Antoine's idea was that you could statically allocate one such buffer to use in such situations, if you're talking about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, it seems ArrayExporter will increase the reference count by 1. But since it starts with a reference count of 1, that means these buffers will never be freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vibhatha I've added some more checks (buffer ref count, allocated memory) into the existing roundtrip tests (
org.apache.arrow.c.RoundtripTest
) of C data interface. The tests are passed with these additional checks. Please take a look. Thanks.As this issue blocks some important feature in apache/datafusion-comet#95, we really like to move this forward. Thanks a lot. 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look at this as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vibhatha, sorry for pinging you. Do you have a chance to look at this recently? One important feature is blocked by the issue and we are looking forward to move this into Java Arrow library. Thank you. 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya I couldn't take a look. But let me see if I can allocate some time to do this probably earlier next week?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vibhatha It will be great! Thank you so much!