-
Notifications
You must be signed in to change notification settings - Fork 0
Try and improve GC performance #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| let (views_buf, data_blocks) = if total_large < i32::MAX as usize { | ||
| // fast path, the entire data fits in a single buffer | ||
| // 3) Allocate exactly capacity for all non-inline data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this restores the original code and then only drops into the slow path when needed
| } | ||
| }, | ||
| ); | ||
| views_buf.extend(new_views); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried to make this faster using extend rather than push, which might help
|
🤖 |
|
🤖: Benchmark completed Details
|
| }]; | ||
| let gc_copy_groups = if total_large > i32::MAX as usize { | ||
| // Slow-path: need to split into multiple copy groups | ||
| let mut groups = Vec::with_capacity(total_large / (i32::MAX as usize) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb assume a bad case which every string view is 1.5GiB, this would be smaller than expected ( this is why I don't use estimate previously, I think re-allocation here would not be the bottlenect, and the actual size might be more than total_large / (i32::MAX as usize) + 1 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb assume a bad case which every string view is 1.5GiB, this would be smaller than expected
I don't quite follow this logic. For example
- if we had 3 string views each of 1.5G
total_largewould be 4.5G- Then the capacity woudl be (4.5G / 2G + 1) = 3
Which is how many buffers are needed 🤔
( this is why I don't use estimate previously, I think re-allocation here would not be the bottlenect, and the actual size might be more than total_large / (i32::MAX as usize) + 1 )
I agree reallocation is likely not to be the bottleneck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, assume 100Bytes for a BinaryView and 1.5G for a BinaryView, the case would be different. Assume dest size is 15GiB:
- 100Bytes, it would be (15 / 2G + 1) = 8
- 1.5G, it would require 10 lists, since 2Gib would at most have 1 binary view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(A better example is every BinaryView.len() is 1GiB and 1byte more ( Or string interleaves with 1Gib, 1byte, 1Gib, 1byte...)
|
Would it be better if I extract a new function for slow path? |
Yes I think that would be good. I didn't do it in this PR to keep the diff smaller Would you like me to do so ? Or would you like to? |
|
I would like to do this when I back home, may at 7PM in UTC-8 , I may not have time for this when I'm in office :-( |
It basically restores the original code path when all views fit in a single buffer, and only computes the GcCopyGroup when needed. I hope this will restore performance