-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VTA][OpenCL] intelfocl #6126
[VTA][OpenCL] intelfocl #6126
Conversation
f6d0f72
to
37692bf
Compare
wkl = traverse(op.input_tensors) | ||
if wkl: | ||
return wkl | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mind explaining the changes made to this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original code will fail if there are multiple workloads in one schedule. For example, in fused_nn_conv2d_add_add_right_shift_clip_cast_31
, the conv2d
and add
may both have workload
attrs. We have to get the correct workload by comparing the task_name
.
Previously it works fine, as add
is not a tunable op. But since we also want to put middle alu-only nodes (residual blocks) to VTA, such as fused_cast_cast_add_nn_relu_clip_cast_3
. We create a vta schedule for add
(see add.alu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. How do we guard against extracting add as a standalone op for other backends int his case?
@@ -88,16 +88,6 @@ class BuiltinLower : public StmtExprMutator { | |||
op = stmt.as<AllocateNode>(); | |||
// Get constant allocation bound. | |||
int64_t nbytes = GetVectorBytes(op->dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mind explaining the reasoning behind this deletion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes special handling for kDLCPU. Otherwise, it may cause LLVM parameters match error.
Traceback (most recent call last):
File "vta/tutorials/frontend/deploy_classification.py", line 210, in <module>
params=params, target_host=env.target_host)
File "/4pd/home/zhanghao/workspace/tvm-2/tvm/python/tvm/relay/build_module.py", line 251, in build
graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
File "/4pd/home/zhanghao/workspace/tvm-2/tvm/python/tvm/relay/build_module.py", line 120, in build
self._build(mod, target, target_host)
File "tvm/_ffi/_cython/./packed_func.pxi", line 321, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 256, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 245, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 160, in tvm._ffi._cy3.core.CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(TVMFuncCall+0x4c) [0x7f385ac9bc1c]
[bt] (7) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x316) [0x7f385ab2a566]
[bt] (6) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&)+0xe31) [0x7f385ab29c11]
[bt] (5) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::runtime::String, tvm::IRModule, void, void> const&, tvm::Target const&)+0x3c4) [0x7f385a4322d4]
[bt] (4) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)+0x326) [0x7f385a4318c6]
[bt] (3) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::codegen::Build(tvm::IRModule, tvm::Target const&)+0x67a) [0x7f385a74f68a]
[bt] (2) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(+0x1277ea1) [0x7f385ac7eea1]
[bt] (1) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::codegen::LLVMModuleNode::Init(tvm::IRModule const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x1388) [0x7f385ac82c68]
[bt] (0) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(+0x1276a57) [0x7f385ac7da57]
File "/4pd/home/zhanghao/workspace/tvm-2/tvm/src/target/llvm/llvm_module.cc", line 230
TVMError: LLVM module verification failed with the following errors:
Call parameter type does not match function signature!
%.sub = getelementptr inbounds [4 x <8 x float>], [4 x <8 x float>]* %3, i64 0, i64 0
i8* %34 = call i8* @VTABufferCPUPtr(i8* %17, <8 x float>* nonnull %.sub)
Call parameter type does not match function signature!
%.sub = getelementptr inbounds [8 x float], [8 x float]* %3, i64 0, i64 0
i8* %31 = call i8* @VTABufferCPUPtr(i8* %14, float* nonnull %.sub)
The raise error is due to the LLVM code here (lib/IR/Verifier.cpp):
2598 // Verify that all arguments to the call match the function type.
2599 for (unsigned i = 0, e = FTy->getNumParams(); i != e; ++i)
2600 Assert(CS.getArgument(i)->getType() == FTy->getParamType(i),
2601 "Call parameter type does not match function signature!",
2602 CS.getArgument(i), FTy->getParamType(i), I);
It will raise this error if the special handling for kDLCPU is there. I think it is because the signature for the AllocateNode is not consistent with the parameter? Any ideas about alternative fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tqchen perhaps you'd have some input on why this code was needed in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmoreau89 @tqchen @zhanghaohit
I've been searching for a bug introduced by this PR that somehow doesn't show up in CI? I've tested it locally with the docker image and still see the failure.
Anyway, if I run python/tests/onnx/test_forward.py:test_loop on main locally it fails. If I revert the change to this file, it passes.
I'm tempted to revert this PR until we can find a better way to fix this for VTA, do you guys have a better suggestion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbrookhart I am in favor of reverting the changes applied to this file and in a separate PR we can ensure that the error encountered by @zhanghaohit is resolved while making sure that python/tests/onnx/test_forward.py:test_loop
passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be needed to revert the entire PR, rather introduce a PR that just reverts the changes applied to this file. Given that the Intelfocl backend is not CI tested it's not going to break unit tests in TVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the error is due to the _concatenate_shape_func, where an Any
is checked against a normal shape.
The changes of this PR may introduce Any
shape, thus triggering this bug.
One quick fix is to remove the assert out[i] == inputs[j][i], "Dims mismatch in the inputs of concatenate."
as there may be Any
shape in the inputs, and it should allow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwfromm took a look at the IR generated with and without this code snippet, this is what he got:
it seems like its trying to figure out how to allocate for the shape function
and without that simplifying if statement it gets into a weird recursive let construction
correct IR with fix:
allocate(v_copy_shape_func: Pointer(int64), int64, [1]) {
attr [0] "extern_scope" = 0 {
v_expand_dim_shape_func: Pointer(int64)[0] = 1i64
v_expand_dim_shape_func[1] = (int64*)v_copy_shape_func[0]
}
attr [0] "extern_scope" = 0 {
v_concatenate_shape_func: Pointer(int64)[0] = 0i64
v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)placeholder: Pointer(int64)[0])
v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)v_expand_dim_shape_func[0])
v_concatenate_shape_func[1] = (int64*)placeholder[1]
assert(((int64*)v_concatenate_shape_func[1] == (int64*)v_expand_dim_shape_func[1]), "Dims mismatch in the inputs of concatenate.")
0
}
}
naughty IR:
attr [v_expand_dim_shape_func: Pointer(int64)] "storage_alignment" = 128 {
let v_expand_dim_shape_func = @tir.TVMBackendAllocWorkspace(1, dev_id: int32, 16u64, 0, 64, dtype=handle)
{
if @tir.isnullptr(v_expand_dim_shape_func, dtype=bool) {
@tir.tvm_throw_last_error(, dtype=int32)
}
attr [v_copy_shape_func: Pointer(int64)] "storage_scope" = "global";
attr [v_copy_shape_func] "storage_alignment" = 128 {
let v_copy_shape_func = @tir.TVMBackendAllocWorkspace(1, dev_id, 8u64, 0, 64, dtype=handle)
{
if @tir.isnullptr(v_copy_shape_func, dtype=bool) {
@tir.tvm_throw_last_error(, dtype=int32)
}
{
attr [0] "extern_scope" = 0 {
v_expand_dim_shape_func[0] = 1i64
v_expand_dim_shape_func[1] = (int64*)v_copy_shape_func[0]
}
attr [0] "extern_scope" = 0 {
v_concatenate_shape_func: Pointer(int64)[0] = 0i64
v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)placeholder: Pointer(int64)[0])
v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)v_expand_dim_shape_func[0])
v_concatenate_shape_func[1] = (int64*)placeholder[1]
assert(((int64*)v_concatenate_shape_func[1] == (int64*)v_expand_dim_shape_func[1]), "Dims mismatch in the inputs of concatenate.")
0
}
}
}
if (@tir.TVMBackendFreeWorkspace(1, dev_id, v_copy_shape_func, dtype=int32) != 0) {
@tir.tvm_throw_last_error(, dtype=int32)
}
}
}
if (@tir.TVMBackendFreeWorkspace(1, dev_id, v_expand_dim_shape_func, dtype=int32) != 0) {
@tir.tvm_throw_last_error(, dtype=int32)
}
}
i have a feeling this could cause some non trivial performance regressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is fundamentally changing how shape functions get lowered, I don't think that just removing the assert is the right way to go about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the error is due to the _concatenate_shape_func, where an
Any
is checked against a normal shape.The changes of this PR may introduce
Any
shape, thus triggering this bug.One quick fix is to remove the
assert out[i] == inputs[j][i], "Dims mismatch in the inputs of concatenate."
as there may beAny
shape in the inputs, and it should allow.
:/ It should not be possible to pass Any
to a shape function, everything should be concrete in the VM before the shape_func is called, I'm not sure I buy this.
Thanks @zhanghaohit. This is converging nicely. I made some additional comments. In addition, I'd like to request further partitioning given the large size of the PR. (1) the following files will need to migrate to #6125:
(2) changes made for quantization should be isolated to an additional PR, this includes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address aforementioned changes, thank you
I've also noticed that the issue with the CI is that the VTA submodule is dependent on changes that haven't landed. We'll first merge the changes you have made in incubator-tvm-vta, then this one here. |
Done. Thanks @tmoreau89 for the comments. |
changes made for quantization are moved to #6191 |
Thanks overall the PR is in good shape. I think that we'll need to merge the incubator-tvm-vta PR first, before we merge this one in (due to the bump on the VTA submodule) |
@liangfu would you mind doing a pass on this PR as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, and really sorry for the late rely.
I took another look, and changes looks good.
apache/tvm-vta#9 is finally merged! @zhanghaohit can you please update submodule and let's make sure that the unit tests do pass now that the ISA has changed. Note: we'll also need to modify the bitstreams for the FPGAs that VTA supports. @liangfu are you able to update the bitstreams for DE10 in the next week or so? |
0c90401
to
62d41ab
Compare
@zhanghaohit can you update this PR to update the VTA submodule to the latest? Currently other PRs that update the VTA submodule will break unless we merge this PR. Thanks! |
Sure. Apology for the delay. I think there are some unit tests failed for this PR. I'll try to fix it. |
@tmoreau89 CI seems not working? wait for 9 hours but not start yet. |
@tqchen seems CI is hang? Could you take a look at it? Thanks. |
957aa3c
to
bbb5aed
Compare
Thank you @zhanghaohit @liangfu @vegaluisjose this PR has been merged! |
* intelfocl support * disable tsim test * bugfix to vta autotvm * disable tsim test in task_python_vta_tsim.sh * fix integration test * update vta submodule and re-enable tsim tests * remove unnecessary comments
* intelfocl support * disable tsim test * bugfix to vta autotvm * disable tsim test in task_python_vta_tsim.sh * fix integration test * update vta submodule and re-enable tsim tests * remove unnecessary comments
* intelfocl support * disable tsim test * bugfix to vta autotvm * disable tsim test in task_python_vta_tsim.sh * fix integration test * update vta submodule and re-enable tsim tests * remove unnecessary comments
* intelfocl support * disable tsim test * bugfix to vta autotvm * disable tsim test in task_python_vta_tsim.sh * fix integration test * update vta submodule and re-enable tsim tests * remove unnecessary comments
* intelfocl support * disable tsim test * bugfix to vta autotvm * disable tsim test in task_python_vta_tsim.sh * fix integration test * update vta submodule and re-enable tsim tests * remove unnecessary comments
* intelfocl support * disable tsim test * bugfix to vta autotvm * disable tsim test in task_python_vta_tsim.sh * fix integration test * update vta submodule and re-enable tsim tests * remove unnecessary comments
…pache#8274) * revert a change to lower_tvm_builtin.cc from apache#6126 * disable sim target on VTA tutorial fix bad refactor try again
…pache#8274) * revert a change to lower_tvm_builtin.cc from apache#6126 * disable sim target on VTA tutorial fix bad refactor try again
This is related to #5840 and split from PR #5842
This PR only keeps the modification to TVM for opencl support