Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VTA][OpenCL] intelfocl #6126

Merged
merged 7 commits into from
Apr 20, 2021
Merged

Conversation

zhanghaohit
Copy link
Contributor

This is related to #5840 and split from PR #5842

This PR only keeps the modification to TVM for opencl support

@zhanghaohit zhanghaohit changed the title Feature/intelfocl pr [VTA][OpenCL] intelfocl Jul 23, 2020
wkl = traverse(op.input_tensors)
if wkl:
return wkl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind explaining the changes made to this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original code will fail if there are multiple workloads in one schedule. For example, in fused_nn_conv2d_add_add_right_shift_clip_cast_31, the conv2d and add may both have workload attrs. We have to get the correct workload by comparing the task_name.

Previously it works fine, as add is not a tunable op. But since we also want to put middle alu-only nodes (residual blocks) to VTA, such as fused_cast_cast_add_nn_relu_clip_cast_3. We create a vta schedule for add (see add.alu)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. How do we guard against extracting add as a standalone op for other backends int his case?

@@ -88,16 +88,6 @@ class BuiltinLower : public StmtExprMutator {
op = stmt.as<AllocateNode>();
// Get constant allocation bound.
int64_t nbytes = GetVectorBytes(op->dtype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind explaining the reasoning behind this deletion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes special handling for kDLCPU. Otherwise, it may cause LLVM parameters match error.

Traceback (most recent call last):
  File "vta/tutorials/frontend/deploy_classification.py", line 210, in <module>
    params=params, target_host=env.target_host)
  File "/4pd/home/zhanghao/workspace/tvm-2/tvm/python/tvm/relay/build_module.py", line 251, in build
    graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
  File "/4pd/home/zhanghao/workspace/tvm-2/tvm/python/tvm/relay/build_module.py", line 120, in build
    self._build(mod, target, target_host)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 321, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 256, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 245, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 160, in tvm._ffi._cy3.core.CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(TVMFuncCall+0x4c) [0x7f385ac9bc1c]
  [bt] (7) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x316) [0x7f385ab2a566]
  [bt] (6) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&)+0xe31) [0x7f385ab29c11]
  [bt] (5) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::runtime::String, tvm::IRModule, void, void> const&, tvm::Target const&)+0x3c4) [0x7f385a4322d4]
  [bt] (4) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)+0x326) [0x7f385a4318c6]
  [bt] (3) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::codegen::Build(tvm::IRModule, tvm::Target const&)+0x67a) [0x7f385a74f68a]
  [bt] (2) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(+0x1277ea1) [0x7f385ac7eea1]
  [bt] (1) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::codegen::LLVMModuleNode::Init(tvm::IRModule const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x1388) [0x7f385ac82c68]
  [bt] (0) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(+0x1276a57) [0x7f385ac7da57]
  File "/4pd/home/zhanghao/workspace/tvm-2/tvm/src/target/llvm/llvm_module.cc", line 230
TVMError: LLVM module verification failed with the following errors: 
Call parameter type does not match function signature!
  %.sub = getelementptr inbounds [4 x <8 x float>], [4 x <8 x float>]* %3, i64 0, i64 0
 i8*  %34 = call i8* @VTABufferCPUPtr(i8* %17, <8 x float>* nonnull %.sub)
Call parameter type does not match function signature!
  %.sub = getelementptr inbounds [8 x float], [8 x float]* %3, i64 0, i64 0
 i8*  %31 = call i8* @VTABufferCPUPtr(i8* %14, float* nonnull %.sub)

The raise error is due to the LLVM code here (lib/IR/Verifier.cpp):

2598   // Verify that all arguments to the call match the function type.                                                                                                                                            
2599   for (unsigned i = 0, e = FTy->getNumParams(); i != e; ++i)                                                                                                                                                   
2600     Assert(CS.getArgument(i)->getType() == FTy->getParamType(i),                                                                                                                                               
2601            "Call parameter type does not match function signature!",                                                                                                                                           
2602            CS.getArgument(i), FTy->getParamType(i), I); 

It will raise this error if the special handling for kDLCPU is there. I think it is because the signature for the AllocateNode is not consistent with the parameter? Any ideas about alternative fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen perhaps you'd have some input on why this code was needed in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmoreau89 @tqchen @zhanghaohit

I've been searching for a bug introduced by this PR that somehow doesn't show up in CI? I've tested it locally with the docker image and still see the failure.

Anyway, if I run python/tests/onnx/test_forward.py:test_loop on main locally it fails. If I revert the change to this file, it passes.

I'm tempted to revert this PR until we can find a better way to fix this for VTA, do you guys have a better suggestion here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbrookhart I am in favor of reverting the changes applied to this file and in a separate PR we can ensure that the error encountered by @zhanghaohit is resolved while making sure that python/tests/onnx/test_forward.py:test_loop passes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be needed to revert the entire PR, rather introduce a PR that just reverts the changes applied to this file. Given that the Intelfocl backend is not CI tested it's not going to break unit tests in TVM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the error is due to the _concatenate_shape_func, where an Any is checked against a normal shape.

The changes of this PR may introduce Any shape, thus triggering this bug.

One quick fix is to remove the assert out[i] == inputs[j][i], "Dims mismatch in the inputs of concatenate." as there may be Any shape in the inputs, and it should allow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwfromm took a look at the IR generated with and without this code snippet, this is what he got:

it seems like its trying to figure out how to allocate for the shape function
and without that simplifying if statement it gets into a weird recursive let construction

correct IR with fix:
allocate(v_copy_shape_func: Pointer(int64), int64, [1]) {
  attr [0] "extern_scope" = 0 {
    v_expand_dim_shape_func: Pointer(int64)[0] = 1i64
    v_expand_dim_shape_func[1] = (int64*)v_copy_shape_func[0]
  }
  attr [0] "extern_scope" = 0 {
    v_concatenate_shape_func: Pointer(int64)[0] = 0i64
    v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)placeholder: Pointer(int64)[0])
    v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)v_expand_dim_shape_func[0])
    v_concatenate_shape_func[1] = (int64*)placeholder[1]
    assert(((int64*)v_concatenate_shape_func[1] == (int64*)v_expand_dim_shape_func[1]), "Dims mismatch in the inputs of concatenate.")
    0
  }
}


naughty IR:
attr [v_expand_dim_shape_func: Pointer(int64)] "storage_alignment" = 128 {
  let v_expand_dim_shape_func = @tir.TVMBackendAllocWorkspace(1, dev_id: int32, 16u64, 0, 64, dtype=handle)
   {
    if @tir.isnullptr(v_expand_dim_shape_func, dtype=bool) {
      @tir.tvm_throw_last_error(, dtype=int32)
    }
    attr [v_copy_shape_func: Pointer(int64)] "storage_scope" = "global";
    attr [v_copy_shape_func] "storage_alignment" = 128 {
      let v_copy_shape_func = @tir.TVMBackendAllocWorkspace(1, dev_id, 8u64, 0, 64, dtype=handle)
       {
        if @tir.isnullptr(v_copy_shape_func, dtype=bool) {
          @tir.tvm_throw_last_error(, dtype=int32)
        }
         {
          attr [0] "extern_scope" = 0 {
            v_expand_dim_shape_func[0] = 1i64
            v_expand_dim_shape_func[1] = (int64*)v_copy_shape_func[0]
          }
          attr [0] "extern_scope" = 0 {
            v_concatenate_shape_func: Pointer(int64)[0] = 0i64
            v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)placeholder: Pointer(int64)[0])
            v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)v_expand_dim_shape_func[0])
            v_concatenate_shape_func[1] = (int64*)placeholder[1]
            assert(((int64*)v_concatenate_shape_func[1] == (int64*)v_expand_dim_shape_func[1]), "Dims mismatch in the inputs of concatenate.")
            0
          }
        }
      }
      if (@tir.TVMBackendFreeWorkspace(1, dev_id, v_copy_shape_func, dtype=int32) != 0) {
        @tir.tvm_throw_last_error(, dtype=int32)
      }
    }
  }
  if (@tir.TVMBackendFreeWorkspace(1, dev_id, v_expand_dim_shape_func, dtype=int32) != 0) {
    @tir.tvm_throw_last_error(, dtype=int32)
  }
}
i have a feeling this could cause some non trivial performance regressions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is fundamentally changing how shape functions get lowered, I don't think that just removing the assert is the right way to go about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the error is due to the _concatenate_shape_func, where an Any is checked against a normal shape.

The changes of this PR may introduce Any shape, thus triggering this bug.

One quick fix is to remove the assert out[i] == inputs[j][i], "Dims mismatch in the inputs of concatenate." as there may be Any shape in the inputs, and it should allow.

:/ It should not be possible to pass Any to a shape function, everything should be concrete in the VM before the shape_func is called, I'm not sure I buy this.

@tmoreau89
Copy link
Contributor

Thanks @zhanghaohit. This is converging nicely. I made some additional comments.

In addition, I'd like to request further partitioning given the large size of the PR.

(1) the following files will need to migrate to #6125:

  • src/relay/op/annotation/annotation.cc
  • python/tvm/relay/op/_tensor.py

(2) changes made for quantization should be isolated to an additional PR, this includes:

  • src/relay/quantize/realize.cc
  • python/tvm/relay/quantize/_partition.py
  • python/tvm/relay/quantize/_annotate.py

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address aforementioned changes, thank you

@tmoreau89
Copy link
Contributor

I've also noticed that the issue with the CI is that the VTA submodule is dependent on changes that haven't landed. We'll first merge the changes you have made in incubator-tvm-vta, then this one here.

@zhanghaohit
Copy link
Contributor Author

Please address aforementioned changes, thank you

Done. Thanks @tmoreau89 for the comments.

@zhanghaohit
Copy link
Contributor Author

Thanks @zhanghaohit. This is converging nicely. I made some additional comments.

In addition, I'd like to request further partitioning given the large size of the PR.

(1) the following files will need to migrate to #6125:

  • src/relay/op/annotation/annotation.cc
  • python/tvm/relay/op/_tensor.py

(2) changes made for quantization should be isolated to an additional PR, this includes:

  • src/relay/quantize/realize.cc
  • python/tvm/relay/quantize/_partition.py
  • python/tvm/relay/quantize/_annotate.py

changes made for quantization are moved to #6191

vta/runtime/runtime.cc Outdated Show resolved Hide resolved
vta/runtime/runtime.cc Outdated Show resolved Hide resolved
@tmoreau89
Copy link
Contributor

Thanks overall the PR is in good shape. I think that we'll need to merge the incubator-tvm-vta PR first, before we merge this one in (due to the bump on the VTA submodule)

@tmoreau89
Copy link
Contributor

@liangfu would you mind doing a pass on this PR as well?

@tmoreau89 tmoreau89 requested a review from liangfu August 26, 2020 04:25
@tqchen tqchen changed the base branch from master to main October 11, 2020 18:28
@zhanghaohit
Copy link
Contributor Author

@liangfu would you mind doing a pass on this PR as well?

ping @liangfu

Copy link
Member

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, and really sorry for the late rely.

I took another look, and changes looks good.

@tmoreau89
Copy link
Contributor

apache/tvm-vta#9 is finally merged! @zhanghaohit can you please update submodule and let's make sure that the unit tests do pass now that the ISA has changed.

Note: we'll also need to modify the bitstreams for the FPGAs that VTA supports. @liangfu are you able to update the bitstreams for DE10 in the next week or so?

@tmoreau89
Copy link
Contributor

@zhanghaohit can you update this PR to update the VTA submodule to the latest? Currently other PRs that update the VTA submodule will break unless we merge this PR. Thanks!

@zhanghaohit
Copy link
Contributor Author

@zhanghaohit can you update this PR to update the VTA submodule to the latest? Currently other PRs that update the VTA submodule will break unless we merge this PR. Thanks!

Sure. Apology for the delay. I think there are some unit tests failed for this PR. I'll try to fix it.

@zhanghaohit
Copy link
Contributor Author

@tmoreau89 CI seems not working? wait for 9 hours but not start yet.

@zhanghaohit
Copy link
Contributor Author

@tqchen seems CI is hang? Could you take a look at it? Thanks.

@tmoreau89 tmoreau89 merged commit d0a0194 into apache:main Apr 20, 2021
@tmoreau89
Copy link
Contributor

Thank you @zhanghaohit @liangfu @vegaluisjose this PR has been merged!

mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Apr 22, 2021
* intelfocl support

* disable tsim test

* bugfix to vta autotvm

* disable tsim test in task_python_vta_tsim.sh

* fix integration test

* update vta submodule and re-enable tsim tests

* remove unnecessary comments
echuraev pushed a commit to echuraev/tvm that referenced this pull request Apr 29, 2021
* intelfocl support

* disable tsim test

* bugfix to vta autotvm

* disable tsim test in task_python_vta_tsim.sh

* fix integration test

* update vta submodule and re-enable tsim tests

* remove unnecessary comments
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* intelfocl support

* disable tsim test

* bugfix to vta autotvm

* disable tsim test in task_python_vta_tsim.sh

* fix integration test

* update vta submodule and re-enable tsim tests

* remove unnecessary comments
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* intelfocl support

* disable tsim test

* bugfix to vta autotvm

* disable tsim test in task_python_vta_tsim.sh

* fix integration test

* update vta submodule and re-enable tsim tests

* remove unnecessary comments
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* intelfocl support

* disable tsim test

* bugfix to vta autotvm

* disable tsim test in task_python_vta_tsim.sh

* fix integration test

* update vta submodule and re-enable tsim tests

* remove unnecessary comments
@zhanghaohit zhanghaohit deleted the feature/intelfocl-pr branch May 7, 2021 02:07
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
* intelfocl support

* disable tsim test

* bugfix to vta autotvm

* disable tsim test in task_python_vta_tsim.sh

* fix integration test

* update vta submodule and re-enable tsim tests

* remove unnecessary comments
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Jun 17, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Jul 28, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Jul 29, 2021
zhanghaohit pushed a commit to zhanghaohit/incubator-tvm that referenced this pull request Aug 5, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Aug 13, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Aug 16, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Aug 18, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Aug 19, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Aug 26, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Aug 30, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Aug 31, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Aug 31, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Sep 2, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Sep 3, 2021
mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Sep 7, 2021
tmoreau89 pushed a commit that referenced this pull request Sep 9, 2021
* revert a change to lower_tvm_builtin.cc from #6126

* disable sim target on VTA tutorial

fix bad refactor

try again
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…pache#8274)

* revert a change to lower_tvm_builtin.cc from apache#6126

* disable sim target on VTA tutorial

fix bad refactor

try again
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…pache#8274)

* revert a change to lower_tvm_builtin.cc from apache#6126

* disable sim target on VTA tutorial

fix bad refactor

try again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants