🐛 [Bug] Loading Torch-TensorRT models (.ts) on multiple GPUs (in TorchServe)

##  Bug Description

Everything works well when I'm using 1 GPU, but as soon as I try to load a model on 4 separate GPUs, I get this error:

 MODEL_LOG - RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:42] Expected most_compatible_device to be true but got false
MODEL_LOG - No compatible device was found for instantiating TensorRT engine

## To Reproduce

Steps to reproduce the behavior:

Create a (.ts) model and load it on 4 different GPUs. I don't know if this is specific to TorchServe, or a general issue.

Here's the simple version (TorchServe Handler):

```
def initialize(self, ctx):
        properties = ctx.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
        self.model = torch.jit.load('model.ts')
```

I'm not sure if it relates to this [issue](https://github.com/NVIDIA/TensorRT/issues/219#issuecomment-559249117). From what I can tell it seems like I need to restrict the CUDA context, however, the GPU is assigned in the handler. I tried these things, but it's still giving me the same problem. 

```
def initialize(self, ctx):
        properties = ctx.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
        torch.cuda.set_device(self.device)
        torch_tensorrt.set_device(int(properties.get("gpu_id")))

        with torch.cuda.device(int(properties.get("gpu_id"))):
              self.model = torch.jit.load('model.ts')
              self.model.to(self.device)
              self.model.eval()
```
I also tried mapping the model straight to the GPU on load, but with the same problem. 

## Expected behavior

Load a .ts model by specifying the GPU Id without any issues.

## Environment

> Build information about Torch-TensorRT can be found by turning on debug messages

Official PyTorch image: nvcr.io/nvidia/pytorch:22.12-py3
GPUs: 4x  NVIDIA A10G
Pytorch: [1.14.0a0+410ce96](https://github.com/pytorch/pytorch/commit/410ce96)
[NVIDIA CUDA 11.8.0](https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html)
[TensorRT 8.5.1](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
Ubuntu 20.04 including Python 3.8
NVIDIA CUDA® 11.8.0
NVIDIA cuBLAS 11.11.3.6
NVIDIA cuDNN 8.7.0.84
NVIDIA NCCL 2.15.5 (optimized for NVIDIA NVLink®)
NVIDIA RAPIDS™ 22.10.01 (For x86, only these libraries are included: cudf, xgboost, rmm, cuml, and cugraph.)
Apex
rdma-core 36.0
NVIDIA HPC-X 2.13
OpenMPI 4.1.4+
GDRCopy 2.3
TensorBoard 2.9.0
Nsight Compute 2022.3.0.0
Nsight Systems 2022.4.2.1
NVIDIA TensorRT™ 8.5.1
Torch-TensorRT 1.1.0a0
NVIDIA DALI® 1.20.0
MAGMA 2.6.2
JupyterLab 2.3.2 including Jupyter-TensorBoard
TransformerEngine 0.3.0

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 [Bug] Loading Torch-TensorRT models (.ts) on multiple GPUs (in TorchServe) #1888

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 [Bug] Loading Torch-TensorRT models (.ts) on multiple GPUs (in TorchServe) #1888

Description

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions