Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❓ undefined reference when Building Torch-TensorRT #2624

Open
nicholasguimaraes opened this issue Jan 29, 2024 · 11 comments
Open

❓ undefined reference when Building Torch-TensorRT #2624

nicholasguimaraes opened this issue Jan 29, 2024 · 11 comments
Labels
question Further information is requested

Comments

@nicholasguimaraes
Copy link

❓ Question

What you have already tried

I'm trying to build Torch-TensorRT version 2.3.0a0.
I successfully built Torch 2.3.0.dev.

When building Torch-TensorRT, if I comment http_archive for libtorch and libtorch_pre_cxx11_abi and use the new_local_repository for both of them I get an undefined reference error when running sudo PYTHONPATH=$PYTHONPATH python3 setup.py install

Now If I leave http_archive for libtorch and libtorch_pre_cxx11_abi as default I can "successfully" build Torch-TensorRT but when trying to import it to any python code I get:

ImportError: /home/nick/.local/lib/python3.8/site-packages/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs

In the pyproject.toml file I can see that Torch.2.3.0 is mandatory for building Torch-TensorRT and that is the version of torch installed and running in my environment.

Not sure on how to proceed since it seems I have all the required packages installed.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • PyTorch Version (e.g., 1.0): 2.3.0a0+git4aa1f99
  • OS (e.g., Linux): Ubuntu 20.04
  • How you installed PyTorch (conda, pip, libtorch, source): source
  • Build command you used (if compiling from source): sudo python3 setup.py build develop
  • Are you using local sources or building from archives: local
  • Python version: 3.8
  • CUDA version: 12.1
  • GPU models and configuration: 2080 ti

Additional context

@nicholasguimaraes nicholasguimaraes added the question Further information is requested label Jan 29, 2024
@nicholasguimaraes
Copy link
Author

I am trying to understand if any of the required lib/api versions is incorrect.

Pytorch was compiled from the main repo with version 2.3.0a0+git4aa1f99

Cuda is on version 12.1

TensorRT is on version 8.6.1.6

Libtorch when using http archive is downloaded for cuda version 12.1 which is the exact match to the cuda installed in my system!
But gives the undefined reference when I import torch_tensorrt.

On the other hand if I choose to BUILD LibTorch using new local repository pointing at the local where Torch 2.3.0 dev is installed I cannot finish the torch-tensorrt compilation because of undefined references.

What specific version of torch and libtorch must be used?

@narendasan
Copy link
Collaborator

Did you edit the WORKSPACE file to use your custom pytorch version? otherwise it will pull latest nightly

@nicholasguimaraes
Copy link
Author

Did you edit the WORKSPACE file to use your custom pytorch version? otherwise it will pull latest nightly

Yes I did, when trying to build Torch-TensorRT using my own compiled 2.3.0.dev torch I edited the WORKSPACE file like this:

new_local_repository(
    name = "libtorch",
    path = "/home/nick/Documents/pytorch/torch",
    build_file = "third_party/libtorch/BUILD"
)

new_local_repository(
    name = "libtorch_pre_cxx11_abi",
    path = "/home/nick/Documents/pytorch/torch",
    build_file = "third_party/libtorch/BUILD"
)

Torch dev 2.3.0 is compiled and running like a charm but when I try building Torch-TensorRT I get undefined reference from libtorchtrt.so

Similarly if I comment new_local_repository for libtorch and libtorch_pre_cxx11_abi and use the http_archive I can build Torch-TensorRT but when importing it to a python script I get ImportError: /home/nick/.local/lib/python3.8/site-packages/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs

This is how I built Torch dev:

git clone https://github.com/pytorch/pytorch.git  

sudo python3 setup.py build develop 

@narendasan
Copy link
Collaborator

Oh, if you built pytorch locally you might need to add --use-cxx11-abi as a flag to setup.py since by default pytorch releases use the old abi but source builds use the new one

Something like sudo python3 setup.py develop --use-cxx11-abi should work

@nicholasguimaraes
Copy link
Author

nicholasguimaraes commented Feb 20, 2024

The error persists.

I tried installing pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu121

Which installs torch 2.3.0.dev20240219+cu121 and tried all possible WORKSPACE configuration with
libtorch,
libtorch_pre_cxx11_abi,
cudnn,
tensorrt.

Torch works fine and torch.cuda.is_available() returns True.

Using pip installed torch dev even if compilation is successful I get the undefined Symbol error when importing torch_tensorrt:
ImportError: /home/nick/Documents/tracker_mvit2_s3d/torch-tensorrt/TensorRT/build/lib.linux-x86_64-cpython-38/torch_tensorrt/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit8toIValueEN8pybind116handleERKN3c104Type24SingletonOrSharedTypePtrIS4_EENS3_8optionalIiEE

My latest attempt was pointing both libtorch and libtorch_pre_cxx11_abi to the pip installed torch.dev.2.3.0 package but during compilation I got:

ERROR: /home/nick/Documents/tracker_mvit2_s3d/torch-tensorrt/TensorRT/cpp/bin/torchtrtc/BUILD:12:10: Linking cpp/bin/torchtrtc/torchtrtc failed: (Exit 1): gcc failed: error executing command (from target //cpp/bin/torchtrtc:torchtrtc) /usr/bin/gcc @bazel-out/k8-opt/bin/cpp/bin/torchtrtc/torchtrtc-2.params

It seems that regardless of torch dev 2.3.0 being installed via pip or compiled and regardless of using --use-cxx11-abi flag I cannot either compile or import torch_tensorrt.

I remind everyone that I'm on ubuntu 18 , nvidia driver version 535.54.03 and cuda tool kit 12.1

@narendasan
Copy link
Collaborator

What build command are you using for torch-tensorrt?

@matthost
Copy link

matthost commented Jul 2, 2024

You solve this? Running into this error. Think I have everything setup for Torch-2.3.0, TensorRT 10.0.1, torch-tensorrt 2.3.0, all compiled with cuda 11.8. Also on Python 3.8.

Though everywhere else I've seen this error implies this is due to mismatched deps.

@matthost
Copy link

matthost commented Jul 3, 2024

I do use a source build of torch rather than a distribution, while trying to use a distribution for tensorrt, so wonder if it's the use-cxx11-abi thing...

@matthost
Copy link

matthost commented Jul 8, 2024

I'm going to try moving to all prebuilt distros which seem to be working

@woshizouguo
Copy link

@matthost can you help to clarify how to fix this issue? i am using torch 2.4.1, it has the same error.

@matthost
Copy link

Either build every torch related package from source or use all prebuilt wheels from their download page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants