Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUTLASS] Add conv2d profiler #9737

Merged
merged 8 commits into from
Dec 15, 2021
Merged

Conversation

masahi
Copy link
Member

@masahi masahi commented Dec 14, 2021

Adds a dedicated profiler for conv2d. The source template is modified from the cutlass example in https://github.com/NVIDIA/cutlass/tree/master/examples/16_ampere_tensorop_conv2dfprop

The table below shows selected kernels and their runtime on resnet50 workload, using TVM's profiler I added and cutlass's cutlass_profiler. Scripts I used to get those numbers are available at https://github.com/masahi/tvm-cutlass-eval/tree/master/conv2d.

The reported runtimes mostly agree between two profilers. One major issue I'm aware of is that our profiler is much slower than the cutlass one: While cutlass_profiler takes only 4 min to generate all numbers below, the TVM profiler using this script took 40 min!! I don't see an obvious cause of slowdown in our profiler source at contrib/cutlass/conv2d_profiler.py, but I haven't investigated deeply yet.

cc @comaniac @Laurawly

workload TVM selected kernel name CUTLASS selected kernel name TVM selected kernel runtime CUTLASS selected kernel runtime
{'n': 256, 'h': 56, 'w': 56, 'c': 64, 'k': 256, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h1688fprop_optimized_64x64_32x2_nhwc_align8 cutlass_tensorop_h1688fprop_optimized_64x64_32x2_nhwc_align8 1.27407 1.27324
{'n': 256, 'h': 56, 'w': 56, 'c': 64, 'k': 64, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h1688fprop_optimized_64x128_32x2_nhwc_align2 cutlass_tensorop_h16816fprop_optimized_128x64_32x6_nhwc_align8 0.5039 0.503762
{'n': 256, 'h': 56, 'w': 56, 'c': 64, 'k': 64, 'r': 3, 's': 3, 'pad_h': 1, 'pad_w': 1, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h1688fprop_optimized_128x64_32x2_nhwc_align8 cutlass_tensorop_h1688fprop_optimized_128x64_32x2_nhwc_align8 1.01041 0.994296
{'n': 256, 'h': 56, 'w': 56, 'c': 256, 'k': 64, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h1688fprop_optimized_128x64_32x2_nhwc_align8 cutlass_tensorop_h1688fprop_optimized_64x64_32x2_nhwc_align8 1.23902 1.23865
{'n': 256, 'h': 56, 'w': 56, 'c': 256, 'k': 512, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 2, 'stride_w': 2} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.835523 0.828407
{'n': 256, 'h': 56, 'w': 56, 'c': 256, 'k': 128, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 2, 'stride_w': 2} cutlass_tensorop_h16816fprop_optimized_64x128_32x6_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_64x128_32x6_nhwc_align8 0.383695 0.38307
{'n': 256, 'h': 28, 'w': 28, 'c': 128, 'k': 128, 'r': 3, 's': 3, 'pad_h': 1, 'pad_w': 1, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.740372 0.753655
{'n': 256, 'h': 28, 'w': 28, 'c': 128, 'k': 512, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h1688fprop_optimized_128x128_32x2_nhwc_align8 cutlass_tensorop_h1688fprop_optimized_128x128_32x2_nhwc_align8 0.645284 0.646561
{'n': 256, 'h': 28, 'w': 28, 'c': 512, 'k': 128, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h16816fprop_optimized_64x128_32x6_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_64x128_32x6_nhwc_align8 0.626012 0.626975
{'n': 256, 'h': 28, 'w': 28, 'c': 512, 'k': 1024, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 2, 'stride_w': 2} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.774748 0.766843
{'n': 256, 'h': 28, 'w': 28, 'c': 512, 'k': 256, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 2, 'stride_w': 2} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.208896 0.202144
{'n': 256, 'h': 14, 'w': 14, 'c': 256, 'k': 256, 'r': 3, 's': 3, 'pad_h': 1, 'pad_w': 1, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.737978 0.743775
{'n': 256, 'h': 14, 'w': 14, 'c': 256, 'k': 1024, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.408544 0.410949
{'n': 256, 'h': 14, 'w': 14, 'c': 1024, 'k': 256, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.362004 0.359183
{'n': 256, 'h': 14, 'w': 14, 'c': 1024, 'k': 2048, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 2, 'stride_w': 2} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.707604 0.694497
{'n': 256, 'h': 14, 'w': 14, 'c': 1024, 'k': 512, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 2, 'stride_w': 2} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.18526 0.174509
{'n': 256, 'h': 7, 'w': 7, 'c': 512, 'k': 512, 'r': 3, 's': 3, 'pad_h': 1, 'pad_w': 1, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.729611 0.736473
{'n': 256, 'h': 7, 'w': 7, 'c': 512, 'k': 2048, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.38751 0.382707
{'n': 256, 'h': 7, 'w': 7, 'c': 2048, 'k': 512, 'r': 1, 's': 1, 'pad_h': 0, 'pad_w': 0, 'stride_h': 1, 'stride_w': 1} cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 cutlass_tensorop_h16816fprop_optimized_128x128_32x3_nhwc_align8 0.353987 0.333313

@jroesch
Copy link
Member

jroesch commented Dec 14, 2021

cc @tkonolige

elif is_depthwise_conv2d(data.shape, layout, kernel.shape, kernel_layout, groups):
elif (
is_depthwise_conv2d(data.shape, layout, kernel.shape, kernel_layout, groups)
and "cudnn" not in target.libs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we still want to consider our built in schedules if cudnn is enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuDNN requires a different kernel layout than AutoTVM when the input layout is NHWC, in which case two implementations are not compatible.

But there is no problem with the NCHW layout, so I've refined the condition to

    elif is_depthwise_conv2d(data.shape, layout, kernel.shape, kernel_layout, groups) and (
        layout == "NCHW" or "cudnn" not in target.libs):

"""Instantiate a C++ source for profiling CUTLASS kernels."""


class Conv2dProfilerEmitter(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to write a new class like this for each op we want to support? If so, we might want to think about making something more reusable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but there is not much to share between conv2d and gemm profilers. For now, these twos ops are the only ones we want to support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised this topic before in the GEMM profiler PR, but I agreed with @masahi that it seems not much to share and CUTLASS basically only supports GEMM and Conv2D. Accordingly, it might be a bit overkill to have a common base class at least for now.

Comment on lines +134 to +149
cudaEventRecord(events[0]);

for (int iteration = 0; iteration < 100; ++iteration) {
auto status = implicit_gemm_op();
CUTLASS_CHECK(status);
}

cudaEventRecord(events[1]);
cudaEventSynchronize(events[1]);
float runtime_ms = 0;
cudaEventElapsedTime(&runtime_ms, events[0], events[1]);

for (auto event : events) {
(void)cudaEventDestroy(event);
}
return double(runtime_ms) / 100.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning for putting the timing code inside this template instead of having the template being just the kernel and using our built in timers instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't invoke the full TVM compilation pipeline with BYOC yet at this point: The goal of these profiler templates are just to select the best implementation given a workload. So we can't make use of the built-in profiler.

We could invoke the TVM compilation for each candidate kernel, run and record the execution time using the built-in profiler. The advantage of the current approach is that we can compile profiler binaries once and cache them to a work directory, so the compilation cost is amortized over different workloads / networks. We could do the similar thing with the TVM compilation approach, but that requires compiling each module with dynamic shapes, and store *.so files instead of executables.

Anyway, this is the way GEMM profiler was already written by @Laurawly, so I inherited the same approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to need to invoke the whole compiler pipeline to use the timing code. It will accept any packedfunc. Here the packedfunc would just run the kernel.

If we only are every going to conv2d and gemm, then it probably doesn't matter.

for (auto & event : events) {
cudaEventCreate(&event);
}
cudaEventRecord(events[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using cudaEventRecord is good, but I notice the gemm_profiler does not do that. Maybe we should fix it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed that too and we should use cuda events there as well. I didn't look deeply into gemm_profiler or compare the selected gemm kernels with cutlass_profiler like I did with the covn2d profiler in this PR. Any comment @Laurawly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I want to defer addressing this problem in future PR.

@tkonolige
Copy link
Contributor

@masahi Maybe the reason why the TVM script takes so long is that you are doing 100 iterations per benchmark where as the cutlass script is only doing 20? Also the TVM script is running through the whole tvm compilation pipeline for each workload.

@masahi
Copy link
Member Author

masahi commented Dec 14, 2021

@masahi Maybe the reason why the TVM script takes so long is that you are doing 100 iterations per benchmark where as the cutlass script is only doing 20?

I believe cutlass_profiler is also doing 100 iterations by default: https://github.com/NVIDIA/cutlass/blob/808c25337a3ed4c97ac21895257b1addc72d6ca8/tools/profiler/src/options.cu#L386

Also the TVM script is running through the whole tvm compilation pipeline for each workload.

As I commented in #9737 (comment), we don't invoke the tvm pipeline when we select cutlass kernels.

One major difference with two scripts are that cutlass compiles all kernels into one giant profiler executable, while we generate separate executables for each kernel. So cutlass can allocate / deallocate memory once and loop through each kernel for a given workload to select the best one. Also, I remember that there is a non-trivial initialization cost (close to 1 sec) for any CUDA apps, when we invoke the first CUDA API call - for cutlass_profiler this happens only once while we pay that cost for each profiler binary (there about 60 of them).

But this still doesn't explain 10x difference, so I believe there is something else going on. We could adopt the same approach as cutlass_profiler and compile all candidate kernels into one executable. I didn't do that for conv2d_profiler because I just followed how gemm_profiler is implemented, but that could be a possible improvement.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.
I don't have the preference of using the current approach or TVM timer. It seems to me that the current implementation is simple enough and could keep the CUTLASS implementation standalone, but reusing TVM timer is also a fair concern.

"""Instantiate a C++ source for profiling CUTLASS kernels."""


class Conv2dProfilerEmitter(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised this topic before in the GEMM profiler PR, but I agreed with @masahi that it seems not much to share and CUTLASS basically only supports GEMM and Conv2D. Accordingly, it might be a bit overkill to have a common base class at least for now.

python/tvm/contrib/cutlass/gen_conv2d.py Outdated Show resolved Hide resolved
python/tvm/contrib/cutlass/gen_conv2d.py Outdated Show resolved Hide resolved
for op in ops:
out = self.engine.evaluate(op, args.split(" "))
op["runtime"] = out
if out > 0 and not profile_all:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, now you changed evaluate to return float("inf") when invalid. Then the fist invalid kernel will be selected since float("inf") > 0, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops you are right, changed to out < float("inf")

masahi and others added 6 commits December 15, 2021 06:50
commit 1c0bbb2
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
@masahi masahi merged commit dd42ef2 into apache:main Dec 15, 2021
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
* Add cutlass conv2d profiler

commit 1c0bbb2
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding

* minor fix

* lint fix

* allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn

* Update python/tvm/contrib/cutlass/gen_conv2d.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* simplify processing profiler outputs

* more simplify

* fix runtime check

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
* Add cutlass conv2d profiler

commit 1c0bbb2
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding

* minor fix

* lint fix

* allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn

* Update python/tvm/contrib/cutlass/gen_conv2d.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* simplify processing profiler outputs

* more simplify

* fix runtime check

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
* Add cutlass conv2d profiler

commit 1c0bbb2
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding

* minor fix

* lint fix

* allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn

* Update python/tvm/contrib/cutlass/gen_conv2d.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* simplify processing profiler outputs

* more simplify

* fix runtime check

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Add cutlass conv2d profiler

commit 1c0bbb2
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding

* minor fix

* lint fix

* allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn

* Update python/tvm/contrib/cutlass/gen_conv2d.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* simplify processing profiler outputs

* more simplify

* fix runtime check

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
qsqqsqqsq-intellif pushed a commit to qsqqsqqsq-intellif/tvm that referenced this pull request Apr 29, 2022
* Add cutlass conv2d profiler

commit 1c0bbb2
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding

* minor fix

* lint fix

* allow autotvm NCHW depthwise conv2d schedule even if -libs=cudnn

* Update python/tvm/contrib/cutlass/gen_conv2d.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* simplify processing profiler outputs

* more simplify

* fix runtime check

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants