Skip to content

feat[gpu]: dict take values prim#6101

Merged
joseph-isaacs merged 23 commits intodevelopfrom
ji/dict-array-2
Jan 23, 2026
Merged

feat[gpu]: dict take values prim#6101
joseph-isaacs merged 23 commits intodevelopfrom
ji/dict-array-2

Conversation

@joseph-isaacs
Copy link
Contributor

@joseph-isaacs joseph-isaacs commented Jan 22, 2026

Support cuda dict decode for primitive types

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs changed the title wip feat[gpu]: dict take values prim Jan 22, 2026
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs added the feature A feature request label Jan 22, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 22, 2026

Merging this PR will degrade performance by 29.75%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 2 improved benchmarks
❌ 7 regressed benchmarks
✅ 1249 untouched benchmarks
🆕 4 new benchmarks
⏩ 1290 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 WallTime u32_values_u8_codes[10M] N/A 128.1 µs N/A
🆕 WallTime u32_values_u16_codes[10M] N/A 144.6 µs N/A
🆕 WallTime u64_values_u32_codes[10M] N/A 274.9 µs N/A
🆕 WallTime u64_values_u8_codes[10M] N/A 213.7 µs N/A
Simulation into_canonical_non_nullable[(10000, 100, 0.1)] 3.8 ms 4.6 ms -17.7%
Simulation into_canonical_non_nullable[(10000, 100, 0.01)] 2.2 ms 3 ms -27.05%
Simulation into_canonical_non_nullable[(10000, 100, 0.0)] 1.9 ms 2.7 ms -29.42%
Simulation into_canonical_nullable[(10000, 100, 0.0)] 4.4 ms 5.2 ms -15.62%
Simulation canonical_into_non_nullable[(10000, 100, 0.0)] 1.9 ms 2.7 ms -29.75%
Simulation canonical_into_non_nullable[(10000, 100, 0.01)] 2.1 ms 2.9 ms -27.39%
Simulation canonical_into_nullable[(10000, 10, 0.0)] 528.5 µs 444.1 µs +19.03%
Simulation canonical_into_nullable[(10000, 100, 0.0)] 4.9 ms 4.1 ms +19.79%
Simulation canonical_into_non_nullable[(10000, 100, 0.1)] 3.7 ms 4.5 ms -18.15%

Comparing ji/dict-array-2 (0c69415) with develop (46708ba)

Open in CodSpeed

Footnotes

  1. 1290 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@joseph-isaacs joseph-isaacs marked this pull request as ready for review January 22, 2026 17:15
@joseph-isaacs joseph-isaacs added changelog/feature A new feature and removed feature A feature request labels Jan 22, 2026
@joseph-isaacs joseph-isaacs enabled auto-merge (squash) January 22, 2026 17:15
@joseph-isaacs joseph-isaacs requested a review from 0ax1 January 22, 2026 17:15
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs marked this pull request as draft January 22, 2026 18:40
auto-merge was automatically disabled January 22, 2026 18:40

Pull request was converted to draft

@0ax1
Copy link
Contributor

0ax1 commented Jan 23, 2026

@claude review

@claude
Copy link
Contributor

claude bot commented Jan 23, 2026

a

const BENCH_ARGS: &[(usize, &str)] = &[
(10_000_000, "10M"),
(100_000_000, "100M"),
(1_000_000_000, "1B"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1b is a very slow run. i'd rather keep 1m and 100k. expected 1b inputs seems rather "synthetic"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How slow was it?

Copy link
Contributor

@0ax1 0ax1 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you see any througput diff between 10, 100, 1000M?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did when we rand it before

# Conflicts:
#	vortex-cuda/benches/for_cuda.rs
#	vortex-cuda/build.rs
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs requested a review from 0ax1 January 23, 2026 16:49
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs marked this pull request as ready for review January 23, 2026 16:53
@joseph-isaacs joseph-isaacs enabled auto-merge (squash) January 23, 2026 16:53
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs merged commit a095ca7 into develop Jan 23, 2026
43 of 45 checks passed
@joseph-isaacs joseph-isaacs deleted the ji/dict-array-2 branch January 23, 2026 18:28
danking pushed a commit that referenced this pull request Feb 6, 2026
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants