Suffix implementation name to GPU kernel names #20

iomaganaris · 2024-04-02T12:23:32Z

Added --generate-line-info compilation option for nvcc to improve source code investigation in NSight Compute. (To be used with --import-source yes in ncu)

fthaler · 2024-06-10T13:38:26Z

stencil_benchmarks/benchmarks_collection/stencils/cuda_hip/mixin.py

@@ -114,7 +114,8 @@ def template_args(self):
                    dry_runs=self.dry_runs,
                    timers=self.timers,
                    strides=self.strides,
-                    index_type=self.index_type)
+                    index_type=self.index_type,
+                    implementation_name=self.template_file().partition('.')[0])


I guess the implementation_name is used for easier analysis with NCU? But if we for example run using two different domain sizes, template_file is the same, so this does not really help in general, or does it?
Wouldn’t it be better to either just add the hash of all parameters (which is completely unreadable, but enforces a unique name for unique parameters) or create a human-readable implementation_name that includes all relevant parameters?

But if we for example run using two different domain sizes, template_file is the same, so this does not really help in general, or does it?

It's true that it only partly helps. IIRC one can infer the domain size by the grid size reported by NCU though?
When I was looking into this, distinguishing the kernel names was helpful enough since I could figure out which size was each report then. For different sizes one can also launch the benchmarks separately and create separate ncu-rep files that encode the domain sizes as well.
Tbh I don't think that a hash adds much value towards the desired purpose.

Running just a single kernel at a time was what I always did. But then we don’t need any change from now because you either know exactly what you have run or otherwise you have a problem because of missing information, right?
So I am for either completely fixing the problem (if it is one) or completely ignoring it.

If we run each kernel with separate commands then we can avoid those issues and simplify things. I was just using the sbench-<arch>-collection to run the benchmarks and collect results in batches.
It would still be useful to have a way to inspect and distinguish the generated code for each implementation by either encoding the implementation name in the function name or the filename saved to disk.

iomaganaris added 3 commits April 2, 2024 14:17

Trying to give informative names to the kernels

c658a45

Fix gpu_kernel name in vertical advection kernels

e2584cf

Add line info for use with ncu and --import-source yes

3b90d77

iomaganaris requested a review from fthaler April 2, 2024 12:23

iomaganaris mentioned this pull request Apr 2, 2024

Additional GPU benchmark scripts #16

Closed

fthaler requested changes Jun 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suffix implementation name to GPU kernel names #20

Suffix implementation name to GPU kernel names #20

iomaganaris commented Apr 2, 2024

fthaler Jun 10, 2024

iomaganaris Jun 10, 2024

fthaler Jun 24, 2024

iomaganaris Jul 1, 2024

Suffix implementation name to GPU kernel names #20

Are you sure you want to change the base?

Suffix implementation name to GPU kernel names #20

Conversation

iomaganaris commented Apr 2, 2024

fthaler Jun 10, 2024

Choose a reason for hiding this comment

iomaganaris Jun 10, 2024

Choose a reason for hiding this comment

fthaler Jun 24, 2024

Choose a reason for hiding this comment

iomaganaris Jul 1, 2024

Choose a reason for hiding this comment