Description
Describe the bug
I'm trying to build onnxruntime 1.10.0 for python 3.10 with CUDA support on Windows 10 but I'm stuck because of some unit test failure.
Those are the failed tests:
[----------] Global test environment tear-down
[==========] 2930 tests from 221 test suites ran. (61025 ms total)
[ PASSED ] 2924 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] LSTMTest.ONNXRuntime_TestLSTMShorterSeqInMiddle
[ FAILED ] LSTMTest.ONNXRuntime_TestLSTMZeroSeqInMiddle
[ FAILED ] RNNTest.RNN_bidirectional_zigged_batch
[ FAILED ] RNNTest.RNN_forward_direction_zigged_batch
[ FAILED ] RNNTest.RNN_bidirectional_0
[ FAILED ] RNNTest.RNN_bidirectional_1
6 FAILED TESTS
YOU HAVE 7 DISABLED TESTS
I've got the exact same failures on two different versions of CUDA and cuDNN.
Here's some more details about those failures:
`[ RUN ] LSTMTest.ONNXRuntime_TestLSTMShorterSeqInMiddle
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029125399887561798, which exceeds threshold, where
expected[i] evaluates to -0.029125399887561798,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.06609819084405899, which exceeds threshold, where
expected[i] evaluates to -0.06609819084405899,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029072800651192665, which exceeds threshold, where
expected[i] evaluates to 0.029072800651192665,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029125399887561798, which exceeds threshold, where
expected[i] evaluates to -0.029125399887561798,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.06609819084405899, which exceeds threshold, where
expected[i] evaluates to -0.06609819084405899,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] LSTMTest.ONNXRuntime_TestLSTMShorterSeqInMiddle (19 ms)
[ RUN ] LSTMTest.ONNXRuntime_TestLSTMZeroSeqInMiddle
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029125459492206573, which exceeds threshold, where
expected[i] evaluates to -0.029125459492206573,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.06609819084405899, which exceeds threshold, where
expected[i] evaluates to -0.06609819084405899,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029072800651192665, which exceeds threshold, where
expected[i] evaluates to 0.029072800651192665,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029125459492206573, which exceeds threshold, where
expected[i] evaluates to -0.029125459492206573,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.06609819084405899, which exceeds threshold, where
expected[i] evaluates to -0.06609819084405899,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] LSTMTest.ONNXRuntime_TestLSTMZeroSeqInMiddle (18 ms)
[ RUN ] LSTMTest.SharedPrepackedWeights
[ OK ] LSTMTest.SharedPrepackedWeights (5 ms)
[----------] 27 tests from LSTMTest (254 ms total)
[----------] 9 tests from RNNTest
[ RUN ] RNNTest.RNN_bidirectional_bias_initial_zigged_batch
[ OK ] RNNTest.RNN_bidirectional_bias_initial_zigged_batch (0 ms)
[ RUN ] RNNTest.RNN_bidirectional_zigged_batch
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.31118321418762207, which exceeds threshold, where
expected[i] evaluates to -0.31118321418762207,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.85353446006774902, which exceeds threshold, where
expected[i] evaluates to -0.85353446006774902,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] RNNTest.RNN_bidirectional_zigged_batch (7 ms)
[ RUN ] RNNTest.RNN_reverse_direction_zigged_batch
[ OK ] RNNTest.RNN_reverse_direction_zigged_batch (0 ms)
[ RUN ] RNNTest.RNN_forward_direction_zigged_batch
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.052289962768554688, which exceeds threshold, where
expected[i] evaluates to -0.052289962768554688,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.74626469612121582, which exceeds threshold, where
expected[i] evaluates to -0.74626469612121582,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] RNNTest.RNN_forward_direction_zigged_batch (6 ms)
[ RUN ] RNNTest.RNN_bidirectional_0
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.25082838535308838, which exceeds threshold, where
expected[i] evaluates to -0.25082838535308838,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.74539613723754883, which exceeds threshold, where
expected[i] evaluates to -0.74539613723754883,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] RNNTest.RNN_bidirectional_0 (7 ms)
[ RUN ] RNNTest.RNN_bidirectional_1
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.68878379464149475, which exceeds threshold, where
expected[i] evaluates to 0.98009639978408813,
output[i] evaluates to 0.29131260514259338, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.68878379464149475, which exceeds threshold, where
expected[i] evaluates to 0.98009639978408813,
output[i] evaluates to 0.29131260514259338, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] RNNTest.RNN_bidirectional_1 (7 ms)`
Urgency
Low urgency
System information
- OS Platform and Distribution: Windows 10 10.0.19044
- ONNX Runtime installed from (source or binary): source
- ONNX Runtime version: 1.10.0
- Python version: 3.10
- Visual Studio version (if applicable): VS 2019 16.11.10
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: both CUDA 11.6 cuDNN 8.3.2.44 and CUDA 11.4 cuDNN 8.2.2.26
- GPU model and memory: Nvidia Quadro P5000
To Reproduce
I've tried building with both this command:
.\build.bat --cmake_generator "Visual Studio 16 2019" --parallel --config Release --build_shared_lib --use_cuda --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6" --build_wheel --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6" --cuda_version 11.6 --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=61;75
And this other command:
.\build.bat --cmake_generator "Visual Studio 16 2019" --parallel --config Release --build_shared_lib --use_cuda --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4" --build_wheel --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4" --cuda_version 11.4 --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=61;75