-
Notifications
You must be signed in to change notification settings - Fork 130
Feature: add listview rebuild + cleanup #4819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CodSpeed Performance ReportMerging #4819 will degrade performances by 12.49%Comparing Summary
Benchmarks breakdown
|
Codecov Report❌ Patch coverage is ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
joseph-isaacs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice approach
| ); | ||
|
|
||
| // Validate the `offsets` and `sizes` arrays. | ||
| match_each_integer_ptype!(offset_ptype, |O| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a todo to validate this without a canoncalize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is basically always more efficient to validate after canonicalizing? Or put another way, if we want to add offsets + sizes arrays then we basically always need to canonicalize the bitpacked arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could look at the specific encoding used (e.g. Delta) and do less work?
| if referenced_elements | ||
| .iter() | ||
| .all(|&is_referenced| is_referenced) | ||
| { | ||
| return self.clone(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you merge this with the previous loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think you can do that efficiently? Unless you used some sort of heap data structure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth even checking this? I would assume its non-trival to compute and you are being asked to compact anyway?
| fn rebuild_remove_gaps(&self) -> ListViewArray { | ||
| if self.is_empty() { | ||
| return self.clone(); | ||
| } | ||
|
|
||
| let offset_ptype = self.offsets().dtype().as_ptype(); | ||
| let size_ptype = self.sizes().dtype().as_ptype(); | ||
|
|
||
| match_each_integer_ptype!(offset_ptype, |O| { | ||
| match_each_integer_ptype!(size_ptype, |S| { self.rebuild_remove_gaps_inner::<O, S>() }) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you benchmark this is seems very slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably is slow, but the issue is that the alternatives are generally data dependent or much more complex:
- We could track ranges in a segment-tree-like data structure, but that is only good when the gaps are large
- We could do a union find thing for ranges but that has the same problem as the previous one
- We could use a bitset, which would definitely be more efficient if we implement a complicated prefix summation of a bitvector, and I'm not actually sure it would be much faster (but definitely several orders of magnitude more complicated)
- Roaring bitmaps? Even more complicated...
|
|
||
| // TODO(connor)[ListView]: Figure out a better heuristic. | ||
| if selection_mask.density() < REBUILD_DENSITY_THRESHOLD { | ||
| new_array = new_array.rebuild(ListViewRebuildMode::RemoveGaps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think just adding everything to a builder (the other rebuild might be faster)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is the MakeZeroCopyToList mode. You're probably right (though for this PR I'm remove this)
| // For example, if elements [0, 2, 3] are used, the prefix sum maps: 0->0, 1->1, 2->1, 3->2. | ||
| let mut prefix_sum = vec![0; self.elements().len()]; | ||
| let mut cumulative_sum = 0; | ||
| for i in 0..referenced_elements.len() { | ||
| prefix_sum[i] = cumulative_sum; | ||
| if referenced_elements[i] { | ||
| cumulative_sum += 1; | ||
| } | ||
| } | ||
|
|
||
| // Copy only the referenced elements into a new compacted elements array. | ||
| let mut elements_builder = builder_with_capacity(self.elements().dtype(), cumulative_sum); | ||
| for (i, &is_referenced) in referenced_elements.iter().enumerate() { | ||
| if is_referenced { | ||
| elements_builder | ||
| .append_scalar(&self.elements().scalar_at(i)) | ||
| .vortex_expect("append scalar"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge these loops? Do you think the compiler would in-fact fuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't fuse because the group is dependent on cumulative_sum becoming the capacity of the builder. Maybe we could make the capacity of the builder 0 since it can keep growing? Personally I would just let the compiler figure this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct
Benchmarks: Random AccessSummary
|
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: CompressionSummary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
8b7e9f9 to
55bd04c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add benchmarks for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm going to leave it to the next PR
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
55bd04c to
d0cd099
Compare
Tracking Issue: #4699
Adds a
rebuildmethod onListViewArraythat allows us to rebuild the array to specific shapes that we want (via aListViewRebuildModeenum).This will make it easier to implement an optimization (that we are tentatively calling
IsZeroCopyToList) that tells us theListViewArrayhas no gaps nor overlaps from the views into theelementsbuffer, and that the views are all sorted and in order.I also added an unimplemented
OverlapCompressionrebuild mode if we ever want to do that.Also does some cleanup of other
ListViewfunctionality.