-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suffix implementation name to GPU kernel names #20
base: master
Are you sure you want to change the base?
Conversation
@@ -114,7 +114,8 @@ def template_args(self): | |||
dry_runs=self.dry_runs, | |||
timers=self.timers, | |||
strides=self.strides, | |||
index_type=self.index_type) | |||
index_type=self.index_type, | |||
implementation_name=self.template_file().partition('.')[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the implementation_name
is used for easier analysis with NCU? But if we for example run using two different domain sizes, template_file
is the same, so this does not really help in general, or does it?
Wouldn’t it be better to either just add the hash of all parameters (which is completely unreadable, but enforces a unique name for unique parameters) or create a human-readable implementation_name
that includes all relevant parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we for example run using two different domain sizes, template_file is the same, so this does not really help in general, or does it?
It's true that it only partly helps. IIRC one can infer the domain size by the grid size reported by NCU though?
When I was looking into this, distinguishing the kernel names was helpful enough since I could figure out which size was each report then. For different sizes one can also launch the benchmarks separately and create separate ncu-rep
files that encode the domain sizes as well.
Tbh I don't think that a hash adds much value towards the desired purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running just a single kernel at a time was what I always did. But then we don’t need any change from now because you either know exactly what you have run or otherwise you have a problem because of missing information, right?
So I am for either completely fixing the problem (if it is one) or completely ignoring it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we run each kernel with separate commands then we can avoid those issues and simplify things. I was just using the sbench-<arch>-collection
to run the benchmarks and collect results in batches.
It would still be useful to have a way to inspect and distinguish the generated code for each implementation by either encoding the implementation name in the function name or the filename saved to disk.
--generate-line-info
compilation option fornvcc
to improve source code investigation inNSight Compute
. (To be used with--import-source yes
inncu
)