Add CUDA non-contiguous Unary Ops support #14639

YavorGIvanov · 2025-07-11T23:33:14Z

No description provided.

am17an · 2025-07-12T05:03:07Z

CMakePresets.json

+    { "name": "x64-linux-gcc-debug", "inherits": [ "base", "x64-linux-gcc", "debug" ] },
+    { "name": "x64-linux-gcc-release", "inherits": [ "base", "x64-linux-gcc", "release" ] },
+    { "name": "x64-linux-gcc-reldbg", "inherits": [ "base", "x64-linux-gcc", "reldbg" ] },
+    { "name": "x64-linux-gcc+static-release", "inherits": [ "base", "x64-linux-gcc", "release", "static" ] },


is this accidental?

No. Should I separate it another PR c4ecdef

I am fine with removing it, but I did not see a preset that fit my use case and decided to add.

Maybe easier to merge if you separate into another PR

Please put it into a separate PR.

JohannesGaessler · 2025-07-12T10:10:03Z

CMakePresets.json

+    { "name": "x64-linux-gcc-debug", "inherits": [ "base", "x64-linux-gcc", "debug" ] },
+    { "name": "x64-linux-gcc-release", "inherits": [ "base", "x64-linux-gcc", "release" ] },
+    { "name": "x64-linux-gcc-reldbg", "inherits": [ "base", "x64-linux-gcc", "reldbg" ] },
+    { "name": "x64-linux-gcc+static-release", "inherits": [ "base", "x64-linux-gcc", "release", "static" ] },


Please put it into a separate PR.

JohannesGaessler · 2025-07-12T10:10:38Z

docs/ops/CUDA.csv

What is this file? Did you add it by accident?

@JohannesGaessler recent merge #14598, in subsequent PRs we'll work out how to have such a huge diff when merging. Currently it records the timestamp, device etc so it becomes an entirely new file

@YavorGIvanov For now don't commit the docs/ops/CUDA.csv and docs/ops.md. I'll make a follow-up PR after this gets merged to update the ops table.

I am fine with improving and simplifying process of generating the docs/ops.md to not produce huge diffs myself.

JohannesGaessler · 2025-07-12T10:15:34Z

ggml/src/ggml-cuda/unary.cu

+    const int k) {
+
+    const int i = blockDim.x*blockIdx.x + threadIdx.x;


Suggested change

const int k) {

const int i = blockDim.x*blockIdx.x + threadIdx.x;

const int64_t k) {

const int64_t i = blockDim.x*blockIdx.x + threadIdx.x;

Thanks. Applied as part of other PR review changes.

JohannesGaessler · 2025-07-12T10:18:57Z

ggml/src/ggml-cuda/unary.cu

+    if (ggml_is_contiguous(src) && ggml_is_contiguous(dst_tensor)) {
+        unary_op_kernel<op><<<num_blocks, CUDA_NEG_BLOCK_SIZE, 0, stream>>>(x, dst, k);
+    } else {


Remove the contiguous path, it's no longer needed.

I kept it as the performance of the simple cont kernel is obviously better. I thought you may prefer to still use the most optimal path in this case. I know in the big scheme of things these unary operations are a very small part of the inference time, but think it is good idea to not degrade cont perf in this case.

ABS(type=f32,ne_a=[256,256,3,1],v=0): 532415 runs - 1.88 us/run - 1536 kB/run - 778.95 GB/s ABS(type=f32,ne_a=[256,256,3,1],v=1): 311220 runs - 3.24 us/run - 3070 kB/run - 903.14 GB/s

Here is example perf test using test-backend-ops on a H100 SXM5.
v=0 meaning contiguous and v=1 meaning non-contiguous.

Let me know whether you still want the cont path removed or you agree I should keep it for now.

YavorGIvanov · 2025-07-12T23:44:53Z

@JohannesGaessler @am17an Tried to address all comments.

github-actions bot added documentation Improvements or additions to documentation build Compilation issues Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 11, 2025

YavorGIvanov force-pushed the feature/cuda-non-cont-unary-support branch from c44bfde to 919ce38 Compare July 11, 2025 23:34

am17an reviewed Jul 12, 2025

View reviewed changes

am17an requested a review from JohannesGaessler July 12, 2025 10:08

JohannesGaessler reviewed Jul 12, 2025

View reviewed changes

github-actions bot added the testing Everything test related label Jul 12, 2025

YavorGIvanov force-pushed the feature/cuda-non-cont-unary-support branch from 1174a95 to 1752873 Compare July 12, 2025 23:43

Add CUDA non-contigious Unary ops implementation

64be8c5

YavorGIvanov force-pushed the feature/cuda-non-cont-unary-support branch from 1752873 to 64be8c5 Compare July 12, 2025 23:44

YavorGIvanov mentioned this pull request Jul 12, 2025

Add ELU CUDA support #14657

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CUDA non-contiguous Unary Ops support #14639

Add CUDA non-contiguous Unary Ops support #14639

YavorGIvanov commented Jul 11, 2025

Uh oh!

am17an Jul 12, 2025

Uh oh!

YavorGIvanov Jul 12, 2025

Uh oh!

am17an Jul 12, 2025

Uh oh!

JohannesGaessler Jul 12, 2025

Uh oh!

YavorGIvanov Jul 12, 2025

Uh oh!

JohannesGaessler Jul 12, 2025

Uh oh!

JohannesGaessler Jul 12, 2025

Uh oh!

am17an Jul 12, 2025

Uh oh!

ggerganov Jul 12, 2025

Uh oh!

YavorGIvanov Jul 12, 2025

Uh oh!

JohannesGaessler Jul 12, 2025

Uh oh!

YavorGIvanov Jul 12, 2025

Uh oh!

JohannesGaessler Jul 12, 2025

Uh oh!

YavorGIvanov Jul 12, 2025

Uh oh!

YavorGIvanov commented Jul 12, 2025

Uh oh!

Uh oh!

		const int k) {

		const int i = blockDim.x*blockIdx.x + threadIdx.x;

Add CUDA non-contiguous Unary Ops support #14639

Are you sure you want to change the base?

Add CUDA non-contiguous Unary Ops support #14639

Conversation

YavorGIvanov commented Jul 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YavorGIvanov commented Jul 12, 2025

Uh oh!

Uh oh!