Description
Hello,
I have a simple onnx model of size 89 KB and I use a large batch size to do the inferencing.
OS: Centos 7
GPU: NVIDIA 1080 GTX
CUDA: 11.0
CuDNN: 8.1.1
onnxruntime-gpu version: 1.7.0
Whereas the inferencing goes smoothly for onnxruntime-gpu up to a batch size of 65535, I start to get the following error when the batch size > 65535.
(NOTE: CPU inferencing of my model proceeds fine with onnxruntime and even with batch sizes > 1 Million)
2021-03-22 16:59:54.200488858 [E:onnxruntime:Default, cuda_call.cc:119 CudaCall] CUDNN failure 9: CUDNN_STATUS_NOT_SUPPORTED ; GPU=0 ; hostname=blipp73.sdp.research.bell-labs.com ; expr=cudnnBatchNormalizationForwardInference( CudnnHandle(), cudnn_batch_norm_mode_, &alpha, &beta, data_desc, x_data, data_desc, y_data, bn_tensor_desc, scale_data, b_data, mean_data, var_data, epsilon_);
2021-03-22 16:59:54.200560113 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running BatchNormalization node. Name:'batch_normalization' Status Message: CUDNN error executing cudnnBatchNormalizationForwardInference( CudnnHandle(), cudnn_batch_norm_mode_, &alpha, &beta, data_desc, x_data, data_desc, y_data, bn_tensor_desc, scale_data, b_data, mean_data, var_data, epsilon_)
Traceback (most recent call last):
File "onnxruntime_test.1.4.0.py", line 118, in
sys.exit(main())
File "onnxruntime_test.1.4.0.py", line 102, in main
sess.run([], feeds) # fetch all outputs
File "/home/tfs/venv_ORTGPU_test/lib64/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
return self.sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running BatchNormalization node. Name:'batch_normalization' Status Message: CUDNN error executing cudnnBatchNormalizationForwardInference( CudnnHandle(), cudnn_batch_norm_mode, &alpha, &beta, data_desc, x_data, data_desc, y_data, bn_tensor_desc, scale_data, b_data, mean_data, var_data, epsilon_)
Upon investigation, I came across this mxnet discussion:
apache/mxnet#4997 (comment)
But could not verify the limits as set in the CuDNN library for batch size... Reporting this issue so that there is a record. If you can investigate and paste the CuDNN definition for maximum batch size allowed for cudnnBatchNormalizationForwardInference, it would be great.
Thank you,
Buvana