Conversation
Merging this PR will degrade performance by 40.14%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | u16_FoR[10M] |
6.5 µs | 10.8 µs | -40.14% |
Comparing aduffy/like-pushdown (e35f51d) with develop (a095ca7)
Footnotes
-
1290 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
8f4900b to
51f3fa7
Compare
51f3fa7 to
194c964
Compare
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e631ebd to
fbff6c0
Compare
|
FWIW the stats truncation logic has the increment utf8 behaviour you want. It's implemented on a scalar https://github.com/vortex-data/vortex/blob/develop/vortex-scalar/src/binary.rs#L101 so it can mutate it in place. I wonder if they just have to stay separate but maybe upper bound being in place for binary scalar is just a cute optimistaion |
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
|
That function is missing some of the UTF-8 special character handling in the DF implementation, so now I'm worried that one of us is doing the wrong thing... |
|
I gave you the binary version of this function, the utf8 version is here https://github.com/vortex-data/vortex/blob/develop/vortex-scalar/src/utf8.rs#L98 apologies |
|
I was looking at the utf8 version, I guess I don't know enough about utf8 to know if this covers the same edge cases that the datafusion implementation does |
|
I derived it from the same code so we should make sure it’s the same so stats behave the same as regular filters |
3927c66 to
c24c047
Compare
| // Attempt to do min/max pruning for LIKE 'exact' or LIKE 'prefix%' | ||
|
|
||
| // Don't attempt to handle ilike or negated like | ||
| if like_opts.negated || like_opts.case_insensitive { |
There was a problem hiding this comment.
In theory we could handle negated like if we know that the block ONLY contains things which match the prefix, but we'd need to add a starts_with expression and I didn't want to do that here
There was a problem hiding this comment.
actually, we can do this without a starts_with, if we see that min >= prefix AND min < succ AND max >= prefix AND max < succ AND null_count = 0
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
2e47faf to
e35f51d
Compare
Fixes #6128 There are two types of `LIKE` filters that we can prune using our zone maps: | Filter | Zone Map Predicate | |--------|--------| |`col LIKE 'exact'` | `col.min > 'exact' OR col.max < 'exact'` | | `col LIKE 'prefix%'` | `col.min >= 'prefiy' OR col.max < 'prefix'` | I extracted out the existing logic from Utf8Scalar::upper_bound into a trait so we can make sure logic is consistent across stats gen/pruning --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Fixes #6128
There are two types of
LIKEfilters that we can prune using our zone maps:col LIKE 'exact'col.min > 'exact' OR col.max < 'exact'col LIKE 'prefix%'col.min >= 'prefiy' OR col.max < 'prefix'I extracted out the existing logic from Utf8Scalar::upper_bound into a trait so we can make sure logic is consistent across stats gen/pruning