-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUTLASS] Add conv2d profiler #9737
Conversation
cc @tkonolige |
python/tvm/relay/op/strategy/cuda.py
Outdated
elif is_depthwise_conv2d(data.shape, layout, kernel.shape, kernel_layout, groups): | ||
elif ( | ||
is_depthwise_conv2d(data.shape, layout, kernel.shape, kernel_layout, groups) | ||
and "cudnn" not in target.libs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still want to consider our built in schedules if cudnn is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuDNN requires a different kernel layout than AutoTVM when the input layout is NHWC, in which case two implementations are not compatible.
But there is no problem with the NCHW layout, so I've refined the condition to
elif is_depthwise_conv2d(data.shape, layout, kernel.shape, kernel_layout, groups) and (
layout == "NCHW" or "cudnn" not in target.libs):
"""Instantiate a C++ source for profiling CUTLASS kernels.""" | ||
|
||
|
||
class Conv2dProfilerEmitter(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to write a new class like this for each op we want to support? If so, we might want to think about making something more reusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but there is not much to share between conv2d and gemm profilers. For now, these twos ops are the only ones we want to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I raised this topic before in the GEMM profiler PR, but I agreed with @masahi that it seems not much to share and CUTLASS basically only supports GEMM and Conv2D. Accordingly, it might be a bit overkill to have a common base class at least for now.
cudaEventRecord(events[0]); | ||
|
||
for (int iteration = 0; iteration < 100; ++iteration) { | ||
auto status = implicit_gemm_op(); | ||
CUTLASS_CHECK(status); | ||
} | ||
|
||
cudaEventRecord(events[1]); | ||
cudaEventSynchronize(events[1]); | ||
float runtime_ms = 0; | ||
cudaEventElapsedTime(&runtime_ms, events[0], events[1]); | ||
|
||
for (auto event : events) { | ||
(void)cudaEventDestroy(event); | ||
} | ||
return double(runtime_ms) / 100.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning for putting the timing code inside this template instead of having the template being just the kernel and using our built in timers instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't invoke the full TVM compilation pipeline with BYOC yet at this point: The goal of these profiler templates are just to select the best implementation given a workload. So we can't make use of the built-in profiler.
We could invoke the TVM compilation for each candidate kernel, run and record the execution time using the built-in profiler. The advantage of the current approach is that we can compile profiler binaries once and cache them to a work directory, so the compilation cost is amortized over different workloads / networks. We could do the similar thing with the TVM compilation approach, but that requires compiling each module with dynamic shapes, and store *.so
files instead of executables.
Anyway, this is the way GEMM profiler was already written by @Laurawly, so I inherited the same approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to need to invoke the whole compiler pipeline to use the timing code. It will accept any packedfunc. Here the packedfunc would just run the kernel.
If we only are every going to conv2d and gemm, then it probably doesn't matter.
for (auto & event : events) { | ||
cudaEventCreate(&event); | ||
} | ||
cudaEventRecord(events[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using cudaEventRecord
is good, but I notice the gemm_profiler does not do that. Maybe we should fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed that too and we should use cuda events there as well. I didn't look deeply into gemm_profiler
or compare the selected gemm kernels with cutlass_profiler
like I did with the covn2d profiler in this PR. Any comment @Laurawly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I want to defer addressing this problem in future PR.
@masahi Maybe the reason why the TVM script takes so long is that you are doing 100 iterations per benchmark where as the cutlass script is only doing 20? Also the TVM script is running through the whole tvm compilation pipeline for each workload. |
60c0fda
to
9f05362
Compare
I believe
As I commented in #9737 (comment), we don't invoke the tvm pipeline when we select cutlass kernels. One major difference with two scripts are that cutlass compiles all kernels into one giant profiler executable, while we generate separate executables for each kernel. So cutlass can allocate / deallocate memory once and loop through each kernel for a given workload to select the best one. Also, I remember that there is a non-trivial initialization cost (close to 1 sec) for any CUDA apps, when we invoke the first CUDA API call - for But this still doesn't explain 10x difference, so I believe there is something else going on. We could adopt the same approach as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
I don't have the preference of using the current approach or TVM timer. It seems to me that the current implementation is simple enough and could keep the CUTLASS implementation standalone, but reusing TVM timer is also a fair concern.
"""Instantiate a C++ source for profiling CUTLASS kernels.""" | ||
|
||
|
||
class Conv2dProfilerEmitter(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I raised this topic before in the GEMM profiler PR, but I agreed with @masahi that it seems not much to share and CUTLASS basically only supports GEMM and Conv2D. Accordingly, it might be a bit overkill to have a common base class at least for now.
for op in ops: | ||
out = self.engine.evaluate(op, args.split(" ")) | ||
op["runtime"] = out | ||
if out > 0 and not profile_all: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, now you changed evaluate
to return float("inf")
when invalid. Then the fist invalid kernel will be selected since float("inf") > 0
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops you are right, changed to out < float("inf")
commit 1c0bbb2 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
4760826
to
28101ff
Compare
* Add cutlass conv2d profiler commit 1c0bbb2 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding * minor fix * lint fix * allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn * Update python/tvm/contrib/cutlass/gen_conv2d.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * simplify processing profiler outputs * more simplify * fix runtime check Co-authored-by: Cody Yu <comaniac0422@gmail.com>
* Add cutlass conv2d profiler commit 1c0bbb2 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding * minor fix * lint fix * allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn * Update python/tvm/contrib/cutlass/gen_conv2d.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * simplify processing profiler outputs * more simplify * fix runtime check Co-authored-by: Cody Yu <comaniac0422@gmail.com>
* Add cutlass conv2d profiler commit 1c0bbb2 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding * minor fix * lint fix * allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn * Update python/tvm/contrib/cutlass/gen_conv2d.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * simplify processing profiler outputs * more simplify * fix runtime check Co-authored-by: Cody Yu <comaniac0422@gmail.com>
* Add cutlass conv2d profiler commit 1c0bbb2 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding * minor fix * lint fix * allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn * Update python/tvm/contrib/cutlass/gen_conv2d.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * simplify processing profiler outputs * more simplify * fix runtime check Co-authored-by: Cody Yu <comaniac0422@gmail.com>
* Add cutlass conv2d profiler commit 1c0bbb2 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding * minor fix * lint fix * allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn * Update python/tvm/contrib/cutlass/gen_conv2d.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * simplify processing profiler outputs * more simplify * fix runtime check Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Adds a dedicated profiler for conv2d. The source template is modified from the cutlass example in https://github.com/NVIDIA/cutlass/tree/master/examples/16_ampere_tensorop_conv2dfprop
The table below shows selected kernels and their runtime on resnet50 workload, using TVM's profiler I added and cutlass's
cutlass_profiler
. Scripts I used to get those numbers are available at https://github.com/masahi/tvm-cutlass-eval/tree/master/conv2d.The reported runtimes mostly agree between two profilers. One major issue I'm aware of is that our profiler is much slower than the cutlass one: While
cutlass_profiler
takes only 4 min to generate all numbers below, the TVM profiler using this script took 40 min!! I don't see an obvious cause of slowdown in our profiler source atcontrib/cutlass/conv2d_profiler.py
, but I haven't investigated deeply yet.cc @comaniac @Laurawly