-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISSCUSSION] the memory tracking approach that better handles shared buffers #6439
Comments
#3960 is likely related. Tagging @jhorstmann @waynexia @alamb |
I personally prefer the allocator approach which gives the most accurate stats in theory. Given we in some aspects encourage sharing underlying buffer and arrays, only changing the behavior of |
I agree with @waynexia -- if the usecase is "accurately track the total memory used across some number of Arrays which (potentially) share underlying |
I don't know if this is the right place to post this, but if this is being reworked, it would also be very useful to have a total memory usage method available within builders, especially for nested stuff. I'm writing a parser which outputs record batches at a static number of rows, but that doesn't work very well when the goal is to keep memory usage roughly constant. It would be nice to be able to get a roughly accurate measure of memory usage in builders in addition to arrays/buffers. |
Builders are an interesting point, but come with two potential challenges to be aware of:
This means any "eager" memory computation is likely to work poorly, however, the lazy approach as currently implemented for arrays could actually work quite well. I could see a world where we remove lazy memory tracking for arrats in favour of eager allocation tracking, and add lazy memory tracking for builders |
That would be super useful for my use case |
As this appears to have stalled slightly I have created a POC in #6590 for what I think eager memory tracking for arrays could look like. I don't really have the capacity to drive this initiative, but I think the approach in the PR should be relatively straightforward for someone to pick up and run with. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
See this comment #6363 (comment),
when the
Buffer
's data is shared between manyBuffer
, thecapacity()
method will always return the total buffer memory usage, which causes the issue #6363, we need discussion the better memory track for shared buffers.arrow-rs/arrow-buffer/src/buffer/immutable.rs
Lines 166 to 168 in d05cf6d
Describe the solution you'd like
The easy way to do this is to return the length and the unused capacity of the
Buffer
like pr #6438,Originally posted by @tustvold in #6438 (comment)
But as @tustvold said, we need more discussion.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: