You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to generate a TensorRT engine of RetinaNet that uses GeneralizedRCNNTransform. By bypassing a couple of layers, it works fine for a static batch size. However, when using a dynamic batch size, I get an issue that I don't know how to fix.
In class GeneralizedRCNNTransform(nn.Module), there is this part in the forward method:
foriinrange(len(images)):
image=images[i]
target_index=targets[i] iftargetsisnotNoneelseNoneifimage.dim() !=3:
raiseValueError("images is expected to be a list of 3d tensors ""of shape [C, H, W], got {}".format(image.shape))
image=self.normalize(image) # here are where the sub and div nodes comes fromimage, target_index=self.resize(image, target_index) # I'm bypassing thisimages[i] =imageiftargetsisnotNoneandtarget_indexisnotNone:
targets[i] =target_index
Which is prefectly fine for the given example batch size (3 in this case), but this tree will not work for any other batch size once it is converted to a TensorRT engine. I get this kind of error:
[12/10/2021-13:29:40] [TRT] [E] 7: [shapeMachine.cpp::execute::565] Error Code 7: Internal Error (Split_0_0: ISliceLayer has out of bounds access on axis 0
condition '<' violated
Instruction: CHECK_LESS 1 1
)
[12/10/2021-13:29:40] [TRT] [E] 2: [executionContext.cpp::enqueueInternal::366] Error Code 2: Internal Error (Could not resolve slots: )
Is there any way I can bypass this for loop which is creating the split and concat nodes? Do I need to redefine the forward method? I can't really bypass the normalisation, or else I will loose in accuracy. Can I maybe normalize it all at once without that for loop?
Thanks 😄
Versions
Collecting environment information...
PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
Python version: 3.6.9 (default, Mar 15 2022, 13:55:28) [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-4.9.253-tegra-aarch64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0
[pip3] torch2trt==0.3.0
[pip3] torchvision==0.11.1
[conda] Could not collect
This is not currently a use-case we support. It's hard to guide you to the best option because I don't have deep expertise on TensorRT and ONNX. We could potentially look into this on the future, but currently it's hard due to our limited resources. Apologies I can't provide better assistance at this point.
🐛 Describe the bug
I'm trying to generate a TensorRT engine of
RetinaNet
that usesGeneralizedRCNNTransform
. By bypassing a couple of layers, it works fine for a static batch size. However, when using a dynamic batch size, I get an issue that I don't know how to fix.In
class GeneralizedRCNNTransform(nn.Module)
, there is this part in theforward
method:When converting my model with:
I get the following tree:
Which is prefectly fine for the given
example
batch size (3 in this case), but this tree will not work for any other batch size once it is converted to a TensorRT engine. I get this kind of error:[12/10/2021-13:29:40] [TRT] [E] 7: [shapeMachine.cpp::execute::565] Error Code 7: Internal Error (Split_0_0: ISliceLayer has out of bounds access on axis 0 condition '<' violated Instruction: CHECK_LESS 1 1 ) [12/10/2021-13:29:40] [TRT] [E] 2: [executionContext.cpp::enqueueInternal::366] Error Code 2: Internal Error (Could not resolve slots: )
Is there any way I can bypass this
for
loop which is creating thesplit
andconcat
nodes? Do I need to redefine the forward method? I can't really bypass the normalisation, or else I will loose in accuracy. Can I maybe normalize it all at once without thatfor
loop?Thanks 😄
Versions
Collecting environment information...
PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (aarch64)
GCC version: (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.10.2
Libc version: glibc-2.25
Python version: 3.6.9 (default, Mar 15 2022, 13:55:28) [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-4.9.253-tegra-aarch64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0
[pip3] torch2trt==0.3.0
[pip3] torchvision==0.11.1
[conda] Could not collect
cc @datumbox @YosuaMichael
The text was updated successfully, but these errors were encountered: