Skip to content

feat[cuda]: filter for Decimal/VarBinView#6196

Merged
a10y merged 8 commits intodevelopfrom
aduffy/filter-decimals
Jan 29, 2026
Merged

feat[cuda]: filter for Decimal/VarBinView#6196
a10y merged 8 commits intodevelopfrom
aduffy/filter-decimals

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Jan 28, 2026

  • Wire in BufferHandles for VarBinView
  • Wire up cub filtering logic for decimals/strings

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 28, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 31.56%

Comparing aduffy/filter-decimals (08acb19) with develop (f568de5)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

⚡ 62 improved benchmarks
❌ 12 regressed benchmarks
✅ 1105 untouched benchmarks
⏩ 1323 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime 10M_10pct[1000000] 131.9 µs 165.7 µs -20.41%
WallTime 1M_90pct[1000000] 57.1 µs 83.4 µs -31.56%
Simulation chunked_varbinview_canonical_into[(10, 1000)] 1.6 ms 2.1 ms -23.07%
Simulation chunked_varbinview_into_canonical[(10, 1000)] 1.6 ms 2.1 ms -22.96%
Simulation bench_compare_sliced_dict_varbinview[(1000, 10000)] 632.7 µs 156.5 µs ×4
Simulation bench_compare_sliced_dict_varbinview[(10000, 10000)] 876.9 µs 491.2 µs +78.5%
Simulation bench_compare_sliced_dict_varbinview[(2000, 10000)] 741.7 µs 218 µs ×3.4
Simulation bench_compare_sliced_dict_varbinview[(20000, 10000)] 1,212.3 µs 687 µs +76.47%
Simulation bench_compare_sliced_dict_varbinview[(3333, 10000)] 887.3 µs 299 µs ×3
Simulation bench_compare_sliced_dict_varbinview[(5000, 10000)] 1,069.7 µs 398.7 µs ×2.7
Simulation bench_compare_sliced_dict_varbinview[(2500, 10000)] 796.4 µs 248.3 µs ×3.2
Simulation bench_compare_sliced_dict_varbinview[(7500, 10000)] 856 µs 470.7 µs +81.87%
Simulation bench_compare_varbin[(10000, 2048)] 457.7 µs 335.4 µs +36.48%
Simulation bench_compare_sliced_dict_varbinview[(9999, 10000)] 877.1 µs 491.2 µs +78.54%
Simulation bench_compare_varbin[(10000, 512)] 258.4 µs 228.8 µs +12.92%
Simulation bench_compare_varbinview[(10000, 512)] 258.2 µs 228.1 µs +13.19%
Simulation bench_compare_varbinview[(10000, 2048)] 457.7 µs 335.6 µs +36.38%
Simulation decode_varbin[(1000, 128)] 82.9 µs 28.9 µs ×2.9
Simulation decode_varbin[(10000, 2)] 677.2 µs 142.2 µs ×4.8
Simulation decode_varbin[(1000, 32)] 82.4 µs 28.4 µs ×2.9
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Footnotes

  1. 1323 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@a10y a10y added the changelog/feature A new feature label Jan 28, 2026
@a10y a10y force-pushed the aduffy/filter-decimals branch from c13887e to 7da02cc Compare January 28, 2026 22:41
@a10y a10y changed the title feat: filter decimal CUDA feat[cuda]: filter for Decimal/VarBinView Jan 28, 2026
@a10y a10y force-pushed the aduffy/filter-decimals branch 2 times, most recently from e973f5c to cc6afa5 Compare January 28, 2026 23:19
@0ax1 0ax1 self-requested a review January 29, 2026 14:35
@a10y a10y force-pushed the aduffy/filter-decimals branch from cc6afa5 to 45510bf Compare January 29, 2026 14:43
Copy link
Contributor

@0ax1 0ax1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Couple of questions / remarks.

}

// Wait for completion
await_stream_callback(stream).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget to remove this in my PR. We don't want to do a blocking wait but just return the handle. The caller can opt-into waiting for prev operations on the stream to complete, by registering its own compiler driver callback or by calling await_stream_callback(stream).await?;.

} = struct_array.into_parts();

// TODO(aduffy): try_join_all
let mut host_fields = vec![];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could collect directly into host_fields, then it doesn't need to be mut.

a10y added 7 commits January 29, 2026 11:56
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/filter-decimals branch from afcc118 to 4f4fccd Compare January 29, 2026 16:57
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y enabled auto-merge (squash) January 29, 2026 17:07
@a10y a10y merged commit ae103af into develop Jan 29, 2026
42 of 44 checks passed
@a10y a10y deleted the aduffy/filter-decimals branch January 29, 2026 17:11
AdamGS pushed a commit that referenced this pull request Feb 2, 2026
* Wire in BufferHandles for VarBinView
* Wire up cub filtering logic for decimals/strings

---------

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
danking pushed a commit that referenced this pull request Feb 6, 2026
* Wire in BufferHandles for VarBinView
* Wire up cub filtering logic for decimals/strings

---------

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants