Slice(ChunkedArray) canonicalizes all chunks instead of only relevant ones#6648
Slice(ChunkedArray) canonicalizes all chunks instead of only relevant ones#6648
Conversation
Merging this PR will improve performance by 12.36%
Performance Changes
Comparing Footnotes
|
…ilder SliceVTable::execute bypasses execute_parent, so SliceExecuteAdaptor never fires on Slice(ChunkedArray) — canonicalizing all chunks instead of just the relevant ones. Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
|
this seems reasonable, running benchmarks to make sure there are no surprises |
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Random AccessSummary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
Benchmarks: CompressionSummary
Detailed Results Table
|
AdamGS
left a comment
There was a problem hiding this comment.
I think this makes perfect sense, I'm not sure there's a way to test it though
|
I noticed this was also missing at Filter as well (#6650). Holding off until we decide whether we want to do this by default (#6655) . It looks like it degrades perf a little but not sure how important that is. A wasted execute_parent call should be just a couple of vtable dispatches and a type check, no? |
SliceVTable was using the default append_to_builder, which calls SliceVTable::execute directly. This bypasses the execution loop, so execute_parent optimizations like SliceExecuteAdaptor(ChunkedVTable) never fire.
For Slice(ChunkedArray), this means the entire chunked array gets canonicalized before slicing — instead of using find_chunk_idx to select only the relevant chunks first.