-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark #479
Conversation
superbench/benchmarks/micro_benchmarks/cublas_function/cublas_benchmark.h
Fixed
Show fixed
Hide fixed
superbench/benchmarks/micro_benchmarks/cublas_function/cublas_benchmark.h
Fixed
Show fixed
Hide fixed
Codecov Report
@@ Coverage Diff @@
## main #479 +/- ##
=======================================
Coverage 87.82% 87.82%
=======================================
Files 87 87
Lines 5625 5627 +2
=======================================
+ Hits 4940 4942 +2
Misses 685 685
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
superbench/benchmarks/micro_benchmarks/cublas_function/cublas_benchmark.h
Outdated
Show resolved
Hide resolved
superbench/benchmarks/micro_benchmarks/cublas_function/cublas_benchmark.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to update cudnn_function as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the random fix looks good. I have a few more thoughts here:
- The random initialization can be accelerated by adding -O3 to the host compiler flags. After some local testing, this cuts about 35% of the run time.
- The random initialization can be further accelerated using OMP.
- The random initialization can be accelerated by moving it to the device.
The issue with the for loops in the warm-up still remains.
superbench/benchmarks/micro_benchmarks/cublas_function/cublas_benchmark.h
Show resolved
Hide resolved
Very good comments. I think we can check it and optimize it in next PR |
…superbenchmark into yutji/cublas-revision
Description
revise cublas-benchmark for flexible warmup and fill data with fixed number for perf test to improve the running efficiency.
Major Revision