-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-4104: [Java] fix a race condition in AllocationManager #3246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3246 +/- ##
==========================================
+ Coverage 88.5% 89.27% +0.76%
==========================================
Files 539 384 -155
Lines 72968 48981 -23987
==========================================
- Hits 64583 43728 -20855
+ Misses 8278 5253 -3025
+ Partials 107 0 -107
Continue to review full report at Codecov.
|
|
@siddharthteotia @jacques-n can you please review ?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems more generic than necessary. Why not make it size aware and do the allocation based on a provided value count? Also, if int[] is always length two in usage, let's make it focus on that case. For example, we shouldn't be allocating the variable data chunk in the same vector since it will have a different lifecycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to use this for both fixed-len (data+validity) and var-len vectors (offsets+validity).
- For the fixed-len vectors, the data buffers and validity buffers are sized for the same number-of-elements.
- For the var-len vectors, the offset buffer needs to be sized for one extra element than the bitmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this can't be generic just like the the original alloc with just valuecount *2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- even if the initial allocation was not a power-of-two allocation, I'm making the realloc switch to a power-of-2. most of the code deals with that.
- during realloc, we need to memcpy from original buffer, and bzero the rest.
other than the above, the code is common (allocAndSlice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking further, I think this can be simplified significantly if I make both the initial allocation and the reAlloc target a multiple-of-2 allocation.
I'll need to fix a lot of tests (the allocation may be larger than what was asked for), but it's probably worth the simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this method at all? Shouldn't this be shared with the value buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used by an api called setNull(int index) - it just sets the validity bit to false without actually adding a data element. If this optimization isn't useful, I can switch to expand both validity/offset arrays (I preserved the old behavior).
Also, used by another api setIndexDefined(int index). I think this one isn't meaningful unless we expand both the validity and the offset arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should always only resize together. If you resized the validity buffer, you'll need to resize the data buffer ultimately anyway (when you call setValueCount). We shouldn't be ever doing a partial resize.
|
@jacques-n - I've addressed the review comments. There is one change in behavior - both allocateNew and reAlloc can allocate more than the requested size, to ensure that the allocated buffer size is a power-of-2. Most of the test related changes are related to this. can you please review now ? |
jacques-n
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're commit message doesn't match your changes. Why are you making the other changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here, we should update this message. The problem here is that we're trying to put more than X values in a record batch, which isn't allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be the rounded up value count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, the value count isn't rounded up. The buffer size is rounded up to a power-of-2 when the allocation happens, and the buffer is then used to it's full capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about exception message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, are you asking more detail in the error message, like
B bytes of memory required for capacity of C values, which exceeds the max limit of L bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this really allocFixedDataAndValidityBufs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's return an object instead of an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just add the pre-checks to this method rather than have in other places.
valueCount between 0 and the value count allowed for a particular type.
type width 0 to 16 (right?)
Also, shouldn't we be returning the calculated valuecount? e.g. if you size for 50 but it turns out we can fit 60 in the same memory, shouldn't we be using that as the available size/value count (for initial alloc purposes, etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the pre-checks.
Also, shouldn't we be returning the calculated valuecount? e.g. if you size for 50 but it turns > out we can fit 60 in the same memory, shouldn't we be using that as the available size/value > count (for initial alloc purposes, etc)?
That's done as part of allocDataAndValidityBufs (both for initial alloc and realloc) - in this case, it will return buffers with capacity of 60. And, the vector implementations will utilize the total capacity of 60.
The api setInitialCapacity() sets the variables based on whatever the user requests. Later, during alloc/realloc, the buffers allocated can have a capacity greater than this (power-of-2 roundup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: space between cast and expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
My initial change reduced the heap usage only for fixed-width vectors. This change extends the same fix for variable-width vectors, and the restructuring for code re-use. I'll open a separate lira for the variable-width vectors. |
I've opened ARROW-4147 for the rest of the change. Will use this to fix the race issue only. |
|
@siddharthteotia @jacques-n - this PR now only has the fix for the race condition. Can you please merge this change ? |
siddharthteotia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
No description provided.