-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI] Example for convolution in GPU #212
Conversation
A few followup comments per offline discussion
|
Remove nvcc compile related registerations in the test code as it requires higher version which test machine does not support |
import topi | ||
from topi.nn.util import get_const_tuple | ||
|
||
TASK = "conv2d_hwcn_map" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove lines until conv2d_hwcn_python
return code | ||
|
||
|
||
def conv2d_hwcn_python(a_np, w_np, stride, padding): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a namespace testing in TOPI, and move this function there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we don't have duplicated functions in recipe and tests
…ster to take an NDArray instead of POD. (apache#216) Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
…ster to take an NDArray instead of POD. (apache#216) Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
…ster to take an NDArray instead of POD. (apache#216) Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
…ster to take an NDArray instead of POD. (apache#216) Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
…ster to take an NDArray instead of POD. (apache#216) Fix the bug in apache#212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added. Fix the bug in tlc-pack/relax#214 (comment), the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. tlc-pack/relax@811e877 adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).
… Matmul Operator (apache#212) * Refactor tilelang dequantize module and add matmul_blocked_weight_only function * remove un-implemented code. * Implement BaseScheduler to wrap some related items. * lint fix * test skip * Refactor tilelang dequantize module and add matmul_blocked_weight_only function * test fix * hardware tuning demo * remove debug related items. * imlement tuner and cache fix * lint fix * test case fix. * Adapt Tuning Space generation with Roller * lint fix * Refactor select_scheduler function for fine-grained interface The select_scheduler function in the dense/__init__.py module has been refactored to use a fine-grained interface. This change provides more flexibility and enables the implementation of high-performance kernels. Update MatmulScheduler class in matmul_tensorcore.py The MatmulScheduler class in the matmul_tensorcore.py module has been updated to calculate the number of threads based on the block size and warp size. This ensures optimal GPU warp configuration for NVIDIA GPUs. Improve test_general_matmul_tilelang_kernel.py The test_general_matmul_tilelang_kernel.py module has been improved to include additional test cases and assertions for correctness. * Refactor select_scheduler function for fine-grained interface * Refactor NotImplementedError message in BaseTLHint class * Update submodule reference in 3rdparty/tvm * Refactor matmul_finetune function to use topk=20 for hardware-aware finetuning * Refactor submodule reference in 3rdparty/tvm * lint fix * Refactor test_general_matmul_tilelang_impl.py and test_tilelang_gemm.py * Refactor MatmulConfig to enable weight propagation on supported devices * Refactor test_general_matmul_tilelang_impl.py and test_general_matmul_tilelang_kernel.py to use centered random values for input tensors * test fix * test fix * Refactor flash attention tests to use centered random values for input tensors * Refactor flash attention tests to use centered random values for input tensors * Refactor flash attention tests to skip test if flash_attn is not installed * lint fix * test fix * test fix * test fix
No description provided.