Skip to content

Unit Tests failure while building on Windows with CUDA EP #10561

Open

Description

Describe the bug
I'm trying to build onnxruntime 1.10.0 for python 3.10 with CUDA support on Windows 10 but I'm stuck because of some unit test failure.

Those are the failed tests:

[----------] Global test environment tear-down
[==========] 2930 tests from 221 test suites ran. (61025 ms total)
[ PASSED ] 2924 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] LSTMTest.ONNXRuntime_TestLSTMShorterSeqInMiddle
[ FAILED ] LSTMTest.ONNXRuntime_TestLSTMZeroSeqInMiddle
[ FAILED ] RNNTest.RNN_bidirectional_zigged_batch
[ FAILED ] RNNTest.RNN_forward_direction_zigged_batch
[ FAILED ] RNNTest.RNN_bidirectional_0
[ FAILED ] RNNTest.RNN_bidirectional_1

6 FAILED TESTS
YOU HAVE 7 DISABLED TESTS

I've got the exact same failures on two different versions of CUDA and cuDNN.

Here's some more details about those failures:

`[ RUN ] LSTMTest.ONNXRuntime_TestLSTMShorterSeqInMiddle
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029125399887561798, which exceeds threshold, where
expected[i] evaluates to -0.029125399887561798,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.06609819084405899, which exceeds threshold, where
expected[i] evaluates to -0.06609819084405899,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029072800651192665, which exceeds threshold, where
expected[i] evaluates to 0.029072800651192665,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029125399887561798, which exceeds threshold, where
expected[i] evaluates to -0.029125399887561798,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.06609819084405899, which exceeds threshold, where
expected[i] evaluates to -0.06609819084405899,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] LSTMTest.ONNXRuntime_TestLSTMShorterSeqInMiddle (19 ms)
[ RUN ] LSTMTest.ONNXRuntime_TestLSTMZeroSeqInMiddle
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029125459492206573, which exceeds threshold, where
expected[i] evaluates to -0.029125459492206573,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.06609819084405899, which exceeds threshold, where
expected[i] evaluates to -0.06609819084405899,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029072800651192665, which exceeds threshold, where
expected[i] evaluates to 0.029072800651192665,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.029125459492206573, which exceeds threshold, where
expected[i] evaluates to -0.029125459492206573,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.06609819084405899, which exceeds threshold, where
expected[i] evaluates to -0.06609819084405899,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] LSTMTest.ONNXRuntime_TestLSTMZeroSeqInMiddle (18 ms)
[ RUN ] LSTMTest.SharedPrepackedWeights
[ OK ] LSTMTest.SharedPrepackedWeights (5 ms)
[----------] 27 tests from LSTMTest (254 ms total)

[----------] 9 tests from RNNTest
[ RUN ] RNNTest.RNN_bidirectional_bias_initial_zigged_batch
[ OK ] RNNTest.RNN_bidirectional_bias_initial_zigged_batch (0 ms)
[ RUN ] RNNTest.RNN_bidirectional_zigged_batch
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.31118321418762207, which exceeds threshold, where
expected[i] evaluates to -0.31118321418762207,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.85353446006774902, which exceeds threshold, where
expected[i] evaluates to -0.85353446006774902,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] RNNTest.RNN_bidirectional_zigged_batch (7 ms)
[ RUN ] RNNTest.RNN_reverse_direction_zigged_batch
[ OK ] RNNTest.RNN_reverse_direction_zigged_batch (0 ms)
[ RUN ] RNNTest.RNN_forward_direction_zigged_batch
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.052289962768554688, which exceeds threshold, where
expected[i] evaluates to -0.052289962768554688,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.74626469612121582, which exceeds threshold, where
expected[i] evaluates to -0.74626469612121582,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] RNNTest.RNN_forward_direction_zigged_batch (6 ms)
[ RUN ] RNNTest.RNN_bidirectional_0
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.25082838535308838, which exceeds threshold, where
expected[i] evaluates to -0.25082838535308838,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.74539613723754883, which exceeds threshold, where
expected[i] evaluates to -0.74539613723754883,
output[i] evaluates to 0, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] RNNTest.RNN_bidirectional_0 (7 ms)
[ RUN ] RNNTest.RNN_bidirectional_1
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.68878379464149475, which exceeds threshold, where
expected[i] evaluates to 0.98009639978408813,
output[i] evaluates to 0.29131260514259338, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
C:\Devel\onnxruntime\onnxruntime\test\providers\provider_test_utils.cc(241): error: The difference between expected[i] and output[i] is 0.68878379464149475, which exceeds threshold, where
expected[i] evaluates to 0.98009639978408813,
output[i] evaluates to 0.29131260514259338, and
threshold evaluates to 0.004999999888241291.
i:0, provider_type: CUDAExecutionProvider
[ FAILED ] RNNTest.RNN_bidirectional_1 (7 ms)`

Urgency
Low urgency

System information

  • OS Platform and Distribution: Windows 10 10.0.19044
  • ONNX Runtime installed from (source or binary): source
  • ONNX Runtime version: 1.10.0
  • Python version: 3.10
  • Visual Studio version (if applicable): VS 2019 16.11.10
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: both CUDA 11.6 cuDNN 8.3.2.44 and CUDA 11.4 cuDNN 8.2.2.26
  • GPU model and memory: Nvidia Quadro P5000

To Reproduce
I've tried building with both this command:
.\build.bat --cmake_generator "Visual Studio 16 2019" --parallel --config Release --build_shared_lib --use_cuda --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6" --build_wheel --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6" --cuda_version 11.6 --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=61;75

And this other command:

.\build.bat --cmake_generator "Visual Studio 16 2019" --parallel --config Release --build_shared_lib --use_cuda --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4" --build_wheel --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4" --cuda_version 11.4 --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=61;75

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    core runtimeissues related to core runtimeep:CUDAissues related to the CUDA execution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions