Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOPI] Example for convolution in GPU #212

Merged
merged 13 commits into from
Jul 19, 2017
Merged

[TOPI] Example for convolution in GPU #212

merged 13 commits into from
Jul 19, 2017

Conversation

icemelon
Copy link
Member

@icemelon icemelon commented Jul 3, 2017

No description provided.

@icemelon icemelon changed the title [TOPI] Example for convolution [TOPI] Example for convolution in GPU Jul 4, 2017
@tqchen
Copy link
Member

tqchen commented Jul 4, 2017

A few followup comments per offline discussion

  • We need to separate declaration from schedule
  • Explicitly name layout in the dataflow(since it is not NCHW)
  • Change name to gpu_conv
  • Make it easy for others to testout common workload, possible include a list of workload configuration in the scipt.

#215

@tqchen
Copy link
Member

tqchen commented Jul 14, 2017

Remove nvcc compile related registerations in the test code as it requires higher version which test machine does not support

import topi
from topi.nn.util import get_const_tuple

TASK = "conv2d_hwcn_map"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove lines until conv2d_hwcn_python

return code


def conv2d_hwcn_python(a_np, w_np, stride, padding):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a namespace testing in TOPI, and move this function there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we don't have duplicated functions in recipe and tests

@tqchen tqchen merged commit eaea99c into apache:master Jul 19, 2017
vinx13 pushed a commit to vinx13/tvm that referenced this pull request Mar 9, 2022
areusch pushed a commit to areusch/tvm that referenced this pull request Sep 20, 2022
…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
junrushao pushed a commit to junrushao/tvm that referenced this pull request Oct 18, 2022
…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
MasterJH5574 pushed a commit to MasterJH5574/tvm that referenced this pull request Nov 20, 2022
…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
junrushao pushed a commit to junrushao/tvm that referenced this pull request Feb 8, 2023
…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
yelite pushed a commit to yelite/tvm that referenced this pull request Feb 17, 2023
…ster to take an NDArray instead of POD. (apache#216)

Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. 

Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
tqchen pushed a commit to tqchen/tvm that referenced this pull request May 9, 2023
This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi
guoyaol pushed a commit to guoyaol/tvm that referenced this pull request May 16, 2023
This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi
tqchen pushed a commit to tqchen/tvm that referenced this pull request May 17, 2023
This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi
junrushao pushed a commit to junrushao/tvm that referenced this pull request May 21, 2023
This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi
tqchen pushed a commit to tqchen/tvm that referenced this pull request May 25, 2023
This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi
zxybazh pushed a commit to zxybazh/tvm that referenced this pull request May 25, 2023
This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi
Lunderberg pushed a commit to Lunderberg/tvm that referenced this pull request Jun 12, 2023
This PR makes the cutlass codegen use the correct bias stride when bias
has more than 2 dimensions. For example, if the input bias has shape (1,
n, 4096), the original code will set `ldc` to 0, which produces
incorrect result.

cc @vinx13 @masahi
LeiWang1999 added a commit to LeiWang1999/tvm that referenced this pull request Nov 8, 2024
… Matmul Operator (apache#212)

* Refactor tilelang dequantize module and add matmul_blocked_weight_only function

* remove un-implemented code.

* Implement BaseScheduler to wrap some related items.

* lint fix

* test skip

* Refactor tilelang dequantize module and add matmul_blocked_weight_only function

* test fix

* hardware tuning demo

* remove debug related items.

* imlement tuner and cache fix

* lint fix

* test case fix.

* Adapt Tuning Space generation with Roller

* lint fix

* Refactor select_scheduler function for fine-grained interface

The select_scheduler function in the dense/__init__.py module has been refactored to use a fine-grained interface. This change provides more flexibility and enables the implementation of high-performance kernels.

Update MatmulScheduler class in matmul_tensorcore.py

The MatmulScheduler class in the matmul_tensorcore.py module has been updated to calculate the number of threads based on the block size and warp size. This ensures optimal GPU warp configuration for NVIDIA GPUs.

Improve test_general_matmul_tilelang_kernel.py

The test_general_matmul_tilelang_kernel.py module has been improved to include additional test cases and assertions for correctness.

* Refactor select_scheduler function for fine-grained interface

* Refactor NotImplementedError message in BaseTLHint class

* Update submodule reference in 3rdparty/tvm

* Refactor matmul_finetune function to use topk=20 for hardware-aware finetuning

* Refactor submodule reference in 3rdparty/tvm

* lint fix

* Refactor test_general_matmul_tilelang_impl.py and test_tilelang_gemm.py

* Refactor MatmulConfig to enable weight propagation on supported devices

* Refactor test_general_matmul_tilelang_impl.py and test_general_matmul_tilelang_kernel.py to use centered random values for input tensors

* test fix

* test fix

* Refactor flash attention tests to use centered random values for input tensors

* Refactor flash attention tests to use centered random values for input tensors

* Refactor flash attention tests to skip test if flash_attn is not installed

* lint fix

* test fix

* test fix

* test fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants