Skip to content

feat[cuda]: patches kernel#6231

Merged
a10y merged 5 commits intodevelopfrom
aduffy/patches-cuda
Feb 2, 2026
Merged

feat[cuda]: patches kernel#6231
a10y merged 5 commits intodevelopfrom
aduffy/patches-cuda

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Jan 30, 2026

Apply patches in-place for BP and ALP.

Added unit tests, and also added patches to the existing BP/ALP tests to verify it works

a10y added 2 commits January 30, 2026 14:18
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/patches-cuda branch from 5557930 to 617ade8 Compare January 30, 2026 22:00
@a10y a10y added the feature A feature request label Jan 30, 2026
@a10y a10y requested review from 0ax1, Copilot and joseph-isaacs and removed request for Copilot January 30, 2026 22:00
@a10y a10y added changelog/feature A new feature and removed feature A feature request labels Jan 30, 2026
const ValueT *const patchValues,
uint64_t patchesLen
) {
const uint32_t idx = blockIdx.x * blockDim.x + threadIdx.x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do likely want to have each kernel instances patch more than a single value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@joseph-isaacs joseph-isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to address the perf, also maybe TODO saying this is a pretty naive approach

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/patches-cuda branch from 4abd87f to 7371736 Compare February 2, 2026 14:57
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/patches-cuda branch from 19b4042 to 7e9717e Compare February 2, 2026 15:02
@a10y a10y enabled auto-merge (squash) February 2, 2026 15:03
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 2, 2026

Merging this PR will degrade performance by 33.09%

❌ 1 regressed benchmark
✅ 1137 untouched benchmarks
⏩ 1384 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_into_canonical[(1000, 10)] 43 µs 64.3 µs -33.09%

Comparing aduffy/patches-cuda (7e9717e) with develop (76f19cb)

Open in CodSpeed

Footnotes

  1. 1384 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@a10y a10y merged commit 782d650 into develop Feb 2, 2026
42 of 46 checks passed
@a10y a10y deleted the aduffy/patches-cuda branch February 2, 2026 15:09
AdamGS pushed a commit that referenced this pull request Feb 2, 2026
Apply patches in-place for BP and ALP.

Added unit tests, and also added patches to the existing BP/ALP tests to
verify it works

---------

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
danking pushed a commit that referenced this pull request Feb 6, 2026
Apply patches in-place for BP and ALP.

Added unit tests, and also added patches to the existing BP/ALP tests to
verify it works

---------

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants