feat[cuda]: filter for Decimal/VarBinView by a10y · Pull Request #6196 · vortex-data/vortex

a10y · 2026-01-28T21:10:44Z

Wire in BufferHandles for VarBinView
Wire up cub filtering logic for decimals/strings

codspeed-hq · 2026-01-28T21:21:48Z

CodSpeed Performance Report

Merging this PR will degrade performance by 31.56%

_{Comparing aduffy/filter-decimals (08acb19) with develop (f568de5)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

⚡ 62 improved benchmarks
❌ 12 regressed benchmarks
✅ 1105 untouched benchmarks
⏩ 1323 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	WallTime	`10M_10pct[1000000]`	131.9 µs	165.7 µs	-20.41%
❌	WallTime	`1M_90pct[1000000]`	57.1 µs	83.4 µs	-31.56%
❌	Simulation	`chunked_varbinview_canonical_into[(10, 1000)]`	1.6 ms	2.1 ms	-23.07%
❌	Simulation	`chunked_varbinview_into_canonical[(10, 1000)]`	1.6 ms	2.1 ms	-22.96%
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(1000, 10000)]`	632.7 µs	156.5 µs	×4
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(10000, 10000)]`	876.9 µs	491.2 µs	+78.5%
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(2000, 10000)]`	741.7 µs	218 µs	×3.4
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(20000, 10000)]`	1,212.3 µs	687 µs	+76.47%
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(3333, 10000)]`	887.3 µs	299 µs	×3
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(5000, 10000)]`	1,069.7 µs	398.7 µs	×2.7
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(2500, 10000)]`	796.4 µs	248.3 µs	×3.2
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(7500, 10000)]`	856 µs	470.7 µs	+81.87%
⚡	Simulation	`bench_compare_varbin[(10000, 2048)]`	457.7 µs	335.4 µs	+36.48%
⚡	Simulation	`bench_compare_sliced_dict_varbinview[(9999, 10000)]`	877.1 µs	491.2 µs	+78.54%
⚡	Simulation	`bench_compare_varbin[(10000, 512)]`	258.4 µs	228.8 µs	+12.92%
⚡	Simulation	`bench_compare_varbinview[(10000, 512)]`	258.2 µs	228.1 µs	+13.19%
⚡	Simulation	`bench_compare_varbinview[(10000, 2048)]`	457.7 µs	335.6 µs	+36.38%
⚡	Simulation	`decode_varbin[(1000, 128)]`	82.9 µs	28.9 µs	×2.9
⚡	Simulation	`decode_varbin[(10000, 2)]`	677.2 µs	142.2 µs	×4.8
⚡	Simulation	`decode_varbin[(1000, 32)]`	82.4 µs	28.4 µs	×2.9
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

1323 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

0ax1

Looks great! Couple of questions / remarks.

vortex-array/src/arrow/executor/byte_view.rs

0ax1 · 2026-01-29T15:25:06Z

vortex-cuda/src/kernel/filter/mod.rs

+    }
+
+    // Wait for completion
+    await_stream_callback(stream).await?;


I forget to remove this in my PR. We don't want to do a blocking wait but just return the handle. The caller can opt-into waiting for prev operations on the stream to complete, by registering its own compiler driver callback or by calling await_stream_callback(stream).await?;.

0ax1 · 2026-01-29T15:28:27Z

vortex-cuda/src/canonical.rs

+                } = struct_array.into_parts();
+
+                // TODO(aduffy): try_join_all
+                let mut host_fields = vec![];


Could collect directly into host_fields, then it doesn't need to be mut.

vortex-cuda/src/canonical.rs

vortex-file/src/tests.rs

vortex-cuda/src/kernel/filter/mod.rs

vortex-cuda/src/canonical.rs

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

* Wire in BufferHandles for VarBinView * Wire up cub filtering logic for decimals/strings --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y added the changelog/feature A new feature label Jan 28, 2026

a10y force-pushed the aduffy/filter-decimals branch from c13887e to 7da02cc Compare January 28, 2026 22:41

a10y changed the title ~~feat: filter decimal CUDA~~ feat[cuda]: filter for Decimal/VarBinView Jan 28, 2026

a10y force-pushed the aduffy/filter-decimals branch 2 times, most recently from e973f5c to cc6afa5 Compare January 28, 2026 23:19

0ax1 self-requested a review January 29, 2026 14:35

a10y force-pushed the aduffy/filter-decimals branch from cc6afa5 to 45510bf Compare January 29, 2026 14:43

0ax1 reviewed Jan 29, 2026

View reviewed changes

vortex-file/src/tests.rs Outdated Show resolved Hide resolved

0ax1 reviewed Jan 29, 2026

View reviewed changes

vortex-cuda/src/kernel/filter/mod.rs Outdated Show resolved Hide resolved

0ax1 reviewed Jan 29, 2026

View reviewed changes

vortex-cuda/src/canonical.rs Show resolved Hide resolved

0ax1 approved these changes Jan 29, 2026

View reviewed changes

a10y added 7 commits January 29, 2026 11:56

Decimal/VarBinView filter

8890270

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

additions

984bf69

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

fixes

68f417e

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

don't await_stream_callback

6d6b4db

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

try_join_all

c5120f3

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

lint

e495d10

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

unused

4f4fccd

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the aduffy/filter-decimals branch from afcc118 to 4f4fccd Compare January 29, 2026 16:57

install CudaSession

08acb19

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y enabled auto-merge (squash) January 29, 2026 17:07

a10y merged commit ae103af into develop Jan 29, 2026
42 of 44 checks passed

a10y deleted the aduffy/filter-decimals branch January 29, 2026 17:11

AdamGS pushed a commit that referenced this pull request Feb 2, 2026

feat[cuda]: filter for Decimal/VarBinView (#6196)

60c347c

* Wire in BufferHandles for VarBinView * Wire up cub filtering logic for decimals/strings --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev>

danking pushed a commit that referenced this pull request Feb 6, 2026

feat[cuda]: filter for Decimal/VarBinView (#6196)

48d9beb

* Wire in BufferHandles for VarBinView * Wire up cub filtering logic for decimals/strings --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat[cuda]: filter for Decimal/VarBinView#6196

feat[cuda]: filter for Decimal/VarBinView#6196
a10y merged 8 commits intodevelopfrom
aduffy/filter-decimals

a10y commented Jan 28, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

0ax1 left a comment

Uh oh!

Uh oh!

Uh oh!

0ax1 Jan 29, 2026

Uh oh!

0ax1 Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

a10y commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging this PR will degrade performance by 31.56%

Summary

Performance Changes

Footnotes

Uh oh!

0ax1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

0ax1 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

0ax1 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a10y commented Jan 28, 2026 •

edited

Loading

codspeed-hq bot commented Jan 28, 2026 •

edited

Loading