BitView SIMD filtering #5356

gatesn · 2025-11-16T15:50:19Z

No description provided.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

gatesn · 2025-11-16T15:51:35Z

vortex-compute/src/filter/slice/neon/neon_u16.rs

+    let mut read_ptr = data;
+    let mut write_ptr = data;
+
+    for word in mask.iter_words() {


This outer loop is duplicative, maybe lift it into parent

Signed-off-by: Nicholas Gates <nick@nickgates.com>

codspeed-hq · 2025-11-18T14:22:57Z

CodSpeed Performance Report

Merging #5356 will degrade performances by 14.99%

_{Comparing ngates/bitview-filter (a783a69) with develop (35d12a3)}

Summary

❌ 15 regressions
✅ 1395 untouched
🆕 157 new
⏩ 631 skipped¹
🗄️ 28 archived benchmarks run²

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`take_indices[(1000, 16, 0.005)]`	20.8 µs	24.4 µs	-14.95%
❌	`take_indices[(1000, 16, 0.01)]`	20.9 µs	24.6 µs	-14.97%
❌	`take_indices[(1000, 16, 0.03)]`	22.8 µs	26.4 µs	-13.62%
❌	`take_indices[(1000, 256, 0.005)]`	20.5 µs	24.2 µs	-14.99%
❌	`take_indices[(1000, 256, 0.01)]`	20.7 µs	24.4 µs	-14.97%
❌	`take_indices[(1000, 256, 0.03)]`	21.1 µs	24.8 µs	-14.84%
❌	`take_indices[(1000, 4, 0.005)]`	23.3 µs	26.9 µs	-13.29%
❌	`take_indices[(1000, 4, 0.01)]`	22.1 µs	25.7 µs	-14.01%
❌	`take_indices[(1000, 4, 0.03)]`	25.1 µs	28.7 µs	-12.55%
❌	`take_indices[(10000, 16, 0.005)]`	24.4 µs	28 µs	-12.85%
❌	`take_indices[(10000, 16, 0.01)]`	27.3 µs	30.9 µs	-11.66%
❌	`take_indices[(10000, 256, 0.005)]`	22.7 µs	26.4 µs	-13.94%
❌	`take_indices[(10000, 256, 0.01)]`	24.8 µs	28.4 µs	-12.84%
❌	`take_indices[(10000, 4, 0.005)]`	27 µs	30.6 µs	-11.77%
❌	`take_indices[(10000, 4, 0.01)]`	30.8 µs	34.4 µs	-10.48%
🆕	`filter_u128[ActualFilter, 0.01]`	N/A	823.3 ns	N/A
🆕	`filter_u128[ActualFilter, 0.05]`	N/A	865.8 ns	N/A
🆕	`filter_u128[ActualFilter, 0.1]`	N/A	880.8 ns	N/A
🆕	`filter_u128[ActualFilter, 0.25]`	N/A	1 µs	N/A
🆕	`filter_u128[ActualFilter, 0.5]`	N/A	1.2 µs	N/A
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

631 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
28 benchmarks were run, but are now archived. If they were deleted in another branch, consider rebasing to remove them from the report. Instead if they were added back, click here to restore them. ↩

Signed-off-by: Nicholas Gates <nick@nickgates.com>

codecov · 2025-11-18T14:50:42Z

Codecov Report

❌ Patch coverage is 0% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.89%. Comparing base (1aac9ca) to head (492dffd).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
vortex-compute/src/filter/slice/scalar.rs	0.00%	36 Missing ⚠️
vortex-buffer/src/bit/view.rs	0.00%	4 Missing ⚠️
vortex-compute/src/filter/slice/mod.rs	0.00%	3 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

vortex-buffer/src/bit/view.rs

0ax1 · 2025-11-18T16:11:25Z

vortex-compute/src/filter/slice/neon/neon_u8.rs

+/// In theory, we could use vqtbl1q_u8 for 16-byte vectors, but that would require a 65KB LUT
+/// (256 entries × 16 bytes each), which is too large for practical use as it thrashes
+/// the CPU cache.
+#[cfg(target_arch = "aarch64")]


What's the point of only adding neon intrinsics, opposed to also having x86. To explore whether this outperforms auto-vectorization?

There is kind of no reasonable auto-vectorization for filtering that I've seen.

@connortsui20 was working on x86 / AVX512 so I figured I'd poke at neon!

0ax1 · 2025-11-19T11:09:37Z

vortex-compute/src/filter/slice/neon/neon_u8.rs

+                        _ => {
+                            // Finally, use the lookup table to compress selected elements
+                            // Load uint8x8 values and compress them using the lookup table.
+                            let values = vld1_u8(read_ptr);


My assumption is that branching and branch mispredictions have a significant overhead here if we do it on a per byte level. I think it's worth benchmarking only keeping the _ => case.

What might be reasonable is keeping these checks on a word level.

0u64 => { // Skip empty chunks } 0xFF => { // All bits set - fast path ptr::copy(read_ptr, write_ptr, 8); write_ptr = write_ptr.add(8); }

But same here: Would be interesting to look at different workloads and see whether introducing the branches on a word level is actually a win over running _ => unconditionally.

0ax1 · 2025-11-19T11:30:04Z

vortex-buffer/src/bit/view.rs

+    }
+
    /// Runs the provided function `f` for each index of a `true` bit in the view.
    pub fn iter_ones<F>(&self, mut f: F)


while raw != 0 { let bit_pos = raw.trailing_zeros(); f(bit_idx + bit_pos as usize); raw &= raw - 1; // Clear the bit at `bit_pos` }

Not introduced as part of this PR, but iterating based on while raw seems to be bad for pipelining, as an iteration depends on the previous one.

connortsui20 · 2025-11-19T19:23:41Z

actually I think this deserves some cleaning up / monomorphizing before we merge this (I will make the scalar implementation better on my PR #5399)

gatesn added 3 commits November 15, 2025 15:15

BitView Filter

1ae589a

Signed-off-by: Nicholas Gates <nick@nickgates.com>

BitView Filter

b1f94ab

Signed-off-by: Nicholas Gates <nick@nickgates.com>

BitView Filter

76018d9

Signed-off-by: Nicholas Gates <nick@nickgates.com>

gatesn added the feature Release label indicating a new feature or request label Nov 16, 2025

gatesn commented Nov 16, 2025

View reviewed changes

gatesn added 6 commits November 17, 2025 13:00

BitView Filter

ee634d4

Signed-off-by: Nicholas Gates <nick@nickgates.com>

merge

9da352f

Signed-off-by: Nicholas Gates <nick@nickgates.com>

merge

4ee7b53

Signed-off-by: Nicholas Gates <nick@nickgates.com>

merge

7938a17

Signed-off-by: Nicholas Gates <nick@nickgates.com>

merge

9261327

Signed-off-by: Nicholas Gates <nick@nickgates.com>

merge

e41644f

Signed-off-by: Nicholas Gates <nick@nickgates.com>

merge

a783a69

Signed-off-by: Nicholas Gates <nick@nickgates.com>

merge

492dffd

Signed-off-by: Nicholas Gates <nick@nickgates.com>

gatesn marked this pull request as ready for review November 18, 2025 16:01

gatesn requested a review from connortsui20 November 18, 2025 16:02

0ax1 reviewed Nov 18, 2025

View reviewed changes

vortex-buffer/src/bit/view.rs Show resolved Hide resolved

0ax1 reviewed Nov 18, 2025

View reviewed changes

0ax1 reviewed Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BitView SIMD filtering #5356

BitView SIMD filtering #5356

Uh oh!

gatesn commented Nov 16, 2025

Uh oh!

gatesn Nov 16, 2025

Uh oh!

codspeed-hq bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

0ax1 Nov 18, 2025

Uh oh!

gatesn Nov 18, 2025

Uh oh!

0ax1 Nov 19, 2025 •

edited

Loading

Uh oh!

0ax1 Nov 19, 2025

Uh oh!

connortsui20 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BitView SIMD filtering #5356

Are you sure you want to change the base?

BitView SIMD filtering #5356

Uh oh!

Conversation

gatesn commented Nov 16, 2025

Uh oh!

gatesn Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #5356 will degrade performances by 14.99%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

codecov bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

0ax1 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gatesn Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

0ax1 Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0ax1 Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

connortsui20 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codspeed-hq bot commented Nov 18, 2025 •

edited

Loading

codecov bot commented Nov 18, 2025 •

edited

Loading

0ax1 Nov 19, 2025 •

edited

Loading