Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suffix implementation name to GPU kernel names #20

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

iomaganaris
Copy link
Contributor

  • Added --generate-line-info compilation option for nvcc to improve source code investigation in NSight Compute. (To be used with --import-source yes in ncu)

@@ -114,7 +114,8 @@ def template_args(self):
dry_runs=self.dry_runs,
timers=self.timers,
strides=self.strides,
index_type=self.index_type)
index_type=self.index_type,
implementation_name=self.template_file().partition('.')[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the implementation_name is used for easier analysis with NCU? But if we for example run using two different domain sizes, template_file is the same, so this does not really help in general, or does it?
Wouldn’t it be better to either just add the hash of all parameters (which is completely unreadable, but enforces a unique name for unique parameters) or create a human-readable implementation_name that includes all relevant parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we for example run using two different domain sizes, template_file is the same, so this does not really help in general, or does it?

It's true that it only partly helps. IIRC one can infer the domain size by the grid size reported by NCU though?
When I was looking into this, distinguishing the kernel names was helpful enough since I could figure out which size was each report then. For different sizes one can also launch the benchmarks separately and create separate ncu-rep files that encode the domain sizes as well.
Tbh I don't think that a hash adds much value towards the desired purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running just a single kernel at a time was what I always did. But then we don’t need any change from now because you either know exactly what you have run or otherwise you have a problem because of missing information, right?
So I am for either completely fixing the problem (if it is one) or completely ignoring it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run each kernel with separate commands then we can avoid those issues and simplify things. I was just using the sbench-<arch>-collection to run the benchmarks and collect results in batches.
It would still be useful to have a way to inspect and distinguish the generated code for each implementation by either encoding the implementation name in the function name or the filename saved to disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants