feat[cuda]: filter for Decimal/VarBinView#6196
Conversation
CodSpeed Performance ReportMerging this PR will degrade performance by 31.56%Comparing
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | 10M_10pct[1000000] |
131.9 µs | 165.7 µs | -20.41% |
| ❌ | WallTime | 1M_90pct[1000000] |
57.1 µs | 83.4 µs | -31.56% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(10, 1000)] |
1.6 ms | 2.1 ms | -23.07% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(10, 1000)] |
1.6 ms | 2.1 ms | -22.96% |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(1000, 10000)] |
632.7 µs | 156.5 µs | ×4 |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(10000, 10000)] |
876.9 µs | 491.2 µs | +78.5% |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(2000, 10000)] |
741.7 µs | 218 µs | ×3.4 |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(20000, 10000)] |
1,212.3 µs | 687 µs | +76.47% |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(3333, 10000)] |
887.3 µs | 299 µs | ×3 |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(5000, 10000)] |
1,069.7 µs | 398.7 µs | ×2.7 |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(2500, 10000)] |
796.4 µs | 248.3 µs | ×3.2 |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(7500, 10000)] |
856 µs | 470.7 µs | +81.87% |
| ⚡ | Simulation | bench_compare_varbin[(10000, 2048)] |
457.7 µs | 335.4 µs | +36.48% |
| ⚡ | Simulation | bench_compare_sliced_dict_varbinview[(9999, 10000)] |
877.1 µs | 491.2 µs | +78.54% |
| ⚡ | Simulation | bench_compare_varbin[(10000, 512)] |
258.4 µs | 228.8 µs | +12.92% |
| ⚡ | Simulation | bench_compare_varbinview[(10000, 512)] |
258.2 µs | 228.1 µs | +13.19% |
| ⚡ | Simulation | bench_compare_varbinview[(10000, 2048)] |
457.7 µs | 335.6 µs | +36.38% |
| ⚡ | Simulation | decode_varbin[(1000, 128)] |
82.9 µs | 28.9 µs | ×2.9 |
| ⚡ | Simulation | decode_varbin[(10000, 2)] |
677.2 µs | 142.2 µs | ×4.8 |
| ⚡ | Simulation | decode_varbin[(1000, 32)] |
82.4 µs | 28.4 µs | ×2.9 |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Footnotes
-
1323 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
c13887e to
7da02cc
Compare
e973f5c to
cc6afa5
Compare
cc6afa5 to
45510bf
Compare
0ax1
left a comment
There was a problem hiding this comment.
Looks great! Couple of questions / remarks.
vortex-cuda/src/kernel/filter/mod.rs
Outdated
| } | ||
|
|
||
| // Wait for completion | ||
| await_stream_callback(stream).await?; |
There was a problem hiding this comment.
I forget to remove this in my PR. We don't want to do a blocking wait but just return the handle. The caller can opt-into waiting for prev operations on the stream to complete, by registering its own compiler driver callback or by calling await_stream_callback(stream).await?;.
vortex-cuda/src/canonical.rs
Outdated
| } = struct_array.into_parts(); | ||
|
|
||
| // TODO(aduffy): try_join_all | ||
| let mut host_fields = vec![]; |
There was a problem hiding this comment.
Could collect directly into host_fields, then it doesn't need to be mut.
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
afcc118 to
4f4fccd
Compare
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
* Wire in BufferHandles for VarBinView * Wire up cub filtering logic for decimals/strings --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev>
* Wire in BufferHandles for VarBinView * Wire up cub filtering logic for decimals/strings --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Uh oh!
There was an error while loading. Please reload this page.