bitunpacking cuda kernels store output into shared memory before copying to main memory #6384
CodSpeed HQ / CodSpeed Performance Analysis
failed
Feb 10, 2026
Performance Regression: -12.9%
⚡ 1 improved benchmark
❌ 3 regressed benchmarks
✅ 1134 untouched benchmarks
⏩ 1265 skipped benchmarks1
⚠️ Please fix the performance issues or acknowledge them on CodSpeed.
Performance Changes
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | true_count_arrow_buffer[128] |
946.9 ns | 859.4 ns | +10.18% |
| ❌ | Simulation | true_count_vortex_buffer[1024] |
1.1 µs | 1.2 µs | -11.93% |
| ❌ | Simulation | true_count_vortex_buffer[2048] |
1.2 µs | 1.4 µs | -10.48% |
| ❌ | Simulation | true_count_vortex_buffer[128] |
984.7 ns | 1,130.6 ns | -12.9% |
Comparing rk/fasterbitpack (d196877) with develop (3cb7fab)2
Footnotes
-
1265 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
develop(00d71b8) during the generation of this report, so 3cb7fab was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
Loading