Skip to content

better?

d196877
Select commit
Loading
Failed to load commit list.
Merged

bitunpacking cuda kernels store output into shared memory before copying to main memory #6384

better?
d196877
Select commit
Loading
Failed to load commit list.
CodSpeed HQ / CodSpeed Performance Analysis failed Feb 10, 2026

Performance Regression: -12.9%

⚡ 1 improved benchmark
❌ 3 regressed benchmarks
✅ 1134 untouched benchmarks
⏩ 1265 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation true_count_arrow_buffer[128] 946.9 ns 859.4 ns +10.18%
Simulation true_count_vortex_buffer[1024] 1.1 µs 1.2 µs -11.93%
Simulation true_count_vortex_buffer[2048] 1.2 µs 1.4 µs -10.48%
Simulation true_count_vortex_buffer[128] 984.7 ns 1,130.6 ns -12.9%

Comparing rk/fasterbitpack (d196877) with develop (3cb7fab)2

Open in CodSpeed

Footnotes

  1. 1265 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on develop (00d71b8) during the generation of this report, so 3cb7fab was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.