New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[TOPI] Example for convolution in GPU #212

Merged

tqchen merged 13 commits into apache:master from icemelon:master

Jul 19, 2017

Member

icemelon commented Jul 3, 2017

No description provided.

icemelon changed the title ~~[TOPI] Example for convolution~~ [TOPI] Example for convolution in GPU

Member

tqchen commented Jul 4, 2017 •

edited

Loading

A few followup comments per offline discussion

We need to separate declaration from schedule
Explicitly name layout in the dataflow(since it is not NCHW)
Change name to gpu_conv
Make it easy for others to testout common workload, possible include a list of workload configuration in the scipt.

Member

tqchen commented Jul 14, 2017

Remove nvcc compile related registerations in the test code as it requires higher version which test machine does not support

tqchen reviewed

View reviewed changes

topi/tests/python/test_topi_conv2d_hwcn_map.py Outdated

+              import topi
+              from topi.nn.util import get_const_tuple
+              TASK = "conv2d_hwcn_map"

Member

tqchen Jul 14, 2017

remove lines until conv2d_hwcn_python

tqchen reviewed

View reviewed changes

topi/tests/python/test_topi_conv2d_hwcn_map.py Outdated

		return code


		def conv2d_hwcn_python(a_np, w_np, stride, padding):

Member

tqchen Jul 14, 2017

add a namespace testing in TOPI, and move this function there

Member

tqchen Jul 14, 2017

so we don't have duplicated functions in recipe and tests

icemelon added 13 commits

July 18, 2017 14:15


          [TOPI] Example for convolution

191cfbd


          update conv ex

7c2ee9c


          fix submodule HalideIR

350097d


          update conv impl

57f1328


          python3

d6ee442


          minor fix

67536a5


          fix pylint error

10a1a1f


          Add test code

60c6237

ee42a0f

fix

18ca3fa

fix

84ebd26


          move python helper function into topi.testing

6254a92


          fix pylint

70dc29a

tqchen approved these changes

View reviewed changes

tqchen merged commit eaea99c into apache:master

vinx13 pushed a commit to vinx13/tvm that referenced this pull request


          [Bugfix] Enable unittest of the buffer reuse feature for cache_write …

1b8b355

…on CUDA (apache#212)

areusch pushed a commit to areusch/tvm that referenced this pull request


          [Bugfix][VM] Fix var binding to a ConstantNode; Force VM if.cond regi…

dbb5415

…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).

junrushao pushed a commit to junrushao/tvm that referenced this pull request


          [Bugfix][VM] Fix var binding to a ConstantNode; Force VM if.cond regi…

41f2d0c

…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).

MasterJH5574 pushed a commit to MasterJH5574/tvm that referenced this pull request


          [Bugfix][VM] Fix var binding to a ConstantNode; Force VM if.cond regi…

5a23346

…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).

junrushao pushed a commit to junrushao/tvm that referenced this pull request


          [Bugfix][VM] Fix var binding to a ConstantNode; Force VM if.cond regi…

88f56a4

…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).

yelite pushed a commit to yelite/tvm that referenced this pull request


          [Bugfix][VM] Fix var binding to a ConstantNode; Force VM if.cond regi…

051ba8b

…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).

tqchen pushed a commit to tqchen/tvm that referenced this pull request


          Fix incorrect bias stride in matmul cutlass offload (apache#212)

6e5c42e

This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi

guoyaol pushed a commit to guoyaol/tvm that referenced this pull request


          Fix incorrect bias stride in matmul cutlass offload (apache#212)

090a070

This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi

tqchen pushed a commit to tqchen/tvm that referenced this pull request


          Fix incorrect bias stride in matmul cutlass offload (apache#212)

bfe58ec

This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi

junrushao pushed a commit to junrushao/tvm that referenced this pull request


          Fix incorrect bias stride in matmul cutlass offload (apache#212)

e787ef4

This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi

tqchen pushed a commit to tqchen/tvm that referenced this pull request


          Fix incorrect bias stride in matmul cutlass offload (apache#212)

32f517f

This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi

zxybazh pushed a commit to zxybazh/tvm that referenced this pull request


          Fix incorrect bias stride in matmul cutlass offload (apache#212)

9e1f098

This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi

Lunderberg pushed a commit to Lunderberg/tvm that referenced this pull request


          Fix incorrect bias stride in matmul cutlass offload (apache#212)

e5c27ef

This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi

LeiWang1999 added a commit to LeiWang1999/tvm that referenced this pull request


          [Dev] Add support and test case for Ladder Weight only Transformation…

988e782

… Matmul Operator (apache#212)

* Refactor tilelang dequantize module and add matmul_blocked_weight_only function

* remove un-implemented code.

* Implement BaseScheduler to wrap some related items.

* lint fix

* test skip

* Refactor tilelang dequantize module and add matmul_blocked_weight_only function

* test fix

* hardware tuning demo

* remove debug related items.

* imlement tuner and cache fix

* lint fix

* test case fix.

* Adapt Tuning Space generation with Roller

* lint fix

* Refactor select_scheduler function for fine-grained interface

The select_scheduler function in the dense/__init__.py module has been refactored to use a fine-grained interface. This change provides more flexibility and enables the implementation of high-performance kernels.

Update MatmulScheduler class in matmul_tensorcore.py

The MatmulScheduler class in the matmul_tensorcore.py module has been updated to calculate the number of threads based on the block size and warp size. This ensures optimal GPU warp configuration for NVIDIA GPUs.

Improve test_general_matmul_tilelang_kernel.py

The test_general_matmul_tilelang_kernel.py module has been improved to include additional test cases and assertions for correctness.

* Refactor select_scheduler function for fine-grained interface

* Refactor NotImplementedError message in BaseTLHint class

* Update submodule reference in 3rdparty/tvm

* Refactor matmul_finetune function to use topk=20 for hardware-aware finetuning

* Refactor submodule reference in 3rdparty/tvm

* lint fix

* Refactor test_general_matmul_tilelang_impl.py and test_tilelang_gemm.py

* Refactor MatmulConfig to enable weight propagation on supported devices

* Refactor test_general_matmul_tilelang_impl.py and test_general_matmul_tilelang_kernel.py to use centered random values for input tensors

* test fix

* test fix

* Refactor flash attention tests to use centered random values for input tensors

* Refactor flash attention tests to use centered random values for input tensors

* Refactor flash attention tests to skip test if flash_attn is not installed

* lint fix

* test fix

* test fix

* test fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet