Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU tests pass when they probably shouldn't #1961

Open
nwnk opened this issue Jun 13, 2024 · 4 comments
Open

GPU tests pass when they probably shouldn't #1961

nwnk opened this issue Jun 13, 2024 · 4 comments
Labels
bug A confirmed library bug help wanted platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Comments

@nwnk
Copy link
Contributor

nwnk commented Jun 13, 2024

Using the oneAPI 2024.1 release, build the SYCL CPU and GPU backends. Ensure that no SYCL devices are available on the system. Then run ctest:

% export OCL_ICD_VENDORS=/dev/null
% sudo dnf -y remove oneapi-level-zero >& /dev/null
% sycl-ls | wc -l
0
% ctest >& test-broken-gpu.log
% grep gpu.*Passed test-broken-gpu.log
  2/453 Test   #2: gpu-bnorm-u8-via-binary-postops-cpp ......................   Passed    0.02 sec
  4/453 Test   #4: gpu-cnn-inference-f32-cpp ................................   Passed    0.02 sec
  6/453 Test   #6: gpu-cnn-inference-int8-cpp ...............................   Passed    0.02 sec
  8/453 Test   #8: gpu-cnn-training-bf16-cpp ................................   Passed    0.02 sec
 10/453 Test  #10: gpu-cnn-training-f32-cpp .................................   Passed    0.02 sec
 13/453 Test  #13: gpu-getting-started-cpp ..................................   Passed    0.02 sec
 15/453 Test  #15: gpu-graph-sycl-getting-started-cpp .......................   Passed    0.02 sec
 17/453 Test  #17: gpu-graph-sycl-single-op-partition-cpp ...................   Passed    0.02 sec
 19/453 Test  #19: gpu-matmul-perf-cpp ......................................   Passed    0.02 sec
 21/453 Test  #21: gpu-memory-format-propagation-cpp ........................   Passed    0.02 sec
 23/453 Test  #23: gpu-performance-profiling-cpp ............................   Passed    0.02 sec
 25/453 Test  #25: gpu-primitives-augru-cpp .................................   Passed    0.02 sec
 27/453 Test  #27: gpu-primitives-batch-normalization-cpp ...................   Passed    0.02 sec
 29/453 Test  #29: gpu-primitives-binary-cpp ................................   Passed    0.02 sec
 31/453 Test  #31: gpu-primitives-concat-cpp ................................   Passed    0.02 sec
 33/453 Test  #33: gpu-primitives-convolution-cpp ...........................   Passed    0.02 sec
 35/453 Test  #35: gpu-primitives-eltwise-cpp ...............................   Passed    0.02 sec
 37/453 Test  #37: gpu-primitives-group-normalization-cpp ...................   Passed    0.02 sec
 39/453 Test  #39: gpu-primitives-inner-product-cpp .........................   Passed    0.02 sec
 41/453 Test  #41: gpu-primitives-layer-normalization-cpp ...................   Passed    0.02 sec
 43/453 Test  #43: gpu-primitives-lbr-gru-cpp ...............................   Passed    0.02 sec
 45/453 Test  #45: gpu-primitives-lrn-cpp ...................................   Passed    0.02 sec
 47/453 Test  #47: gpu-primitives-lstm-cpp ..................................   Passed    0.02 sec
 49/453 Test  #49: gpu-primitives-matmul-cpp ................................   Passed    0.02 sec
 51/453 Test  #51: gpu-primitives-pooling-cpp ...............................   Passed    0.02 sec
 53/453 Test  #53: gpu-primitives-prelu-cpp .................................   Passed    0.02 sec
 55/453 Test  #55: gpu-primitives-reduction-cpp .............................   Passed    0.02 sec
 57/453 Test  #57: gpu-primitives-reorder-cpp ...............................   Passed    0.02 sec
 59/453 Test  #59: gpu-primitives-resampling-cpp ............................   Passed    0.02 sec
 61/453 Test  #61: gpu-primitives-shuffle-cpp ...............................   Passed    0.02 sec
 63/453 Test  #63: gpu-primitives-softmax-cpp ...............................   Passed    0.02 sec
 65/453 Test  #65: gpu-primitives-sum-cpp ...................................   Passed    0.02 sec
 67/453 Test  #67: gpu-primitives-vanilla-rnn-cpp ...........................   Passed    0.02 sec
 69/453 Test  #69: gpu-rnn-training-f32-cpp .................................   Passed    0.02 sec
 71/453 Test  #71: gpu-sycl-interop-buffer-cpp ..............................   Passed    0.02 sec
 73/453 Test  #73: gpu-sycl-interop-usm-cpp .................................   Passed    0.02 sec
 75/453 Test  #75: gpu-tutorials-matmul-inference-int8-matmul-cpp ...........   Passed    0.02 sec
 77/453 Test  #77: gpu-tutorials-matmul-weights-decompression-matmul-cpp ....   Passed    0.02 sec
233/453 Test #233: test_rnn_forward_gpu .....................................   Passed    0.02 sec
235/453 Test #235: test_rnn_forward_buffer_gpu ..............................   Passed    0.02 sec
249/453 Test #249: test_convolution_format_any_gpu ..........................   Passed    0.02 sec
251/453 Test #251: test_convolution_format_any_buffer_gpu ...................   Passed    0.02 sec
379/453 Test #379: test_graph_unit_dnnl_mqa_decomp_usm_gpu ..................   Passed    0.02 sec
391/453 Test #391: test_graph_unit_dnnl_sdp_decomp_usm_gpu ..................   Passed    0.02 sec
395/453 Test #395: test_graph_unit_dnnl_typecast_usm_gpu ....................   Passed    0.02 sec

If you try the same trick with the OCL backend, more things fail like they should, but there's still a few that xpass:

% grep gpu.*Passed test-broken-gpu.log
120/322 Test #120: test_rnn_forward_gpu .....................................   Passed    0.01 sec
121/322 Test #121: test_rnn_forward_buffer_gpu ..............................   Passed    0.01 sec
132/322 Test #132: test_convolution_format_any_gpu ..........................   Passed    0.01 sec
133/322 Test #133: test_convolution_format_any_buffer_gpu ...................   Passed    0.01 sec
247/322 Test #247: test_graph_unit_dnnl_mqa_decomp_usm_gpu ..................   Passed    0.01 sec
259/322 Test #259: test_graph_unit_dnnl_sdp_decomp_usm_gpu ..................   Passed    0.01 sec
263/322 Test #263: test_graph_unit_dnnl_typecast_usm_gpu ....................   Passed    0.01 sec
@nwnk nwnk added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Jun 13, 2024
@shu1chen shu1chen added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jun 14, 2024
@vpirogov
Copy link
Member

vpirogov commented Jun 20, 2024

@nwnk, this behavior is expected. As there's no guarantee that GPU is present GPU tests report pass if no devices are available.

@vpirogov vpirogov added question and removed sighting Suspicious library behavior. Should be promoted to a bug when confirmed labels Jun 20, 2024
@vpirogov vpirogov self-assigned this Jun 20, 2024
@nwnk
Copy link
Contributor Author

nwnk commented Jun 21, 2024

@nwnk, this behavior is expected. As there's no guarantee that GPU is present GPU tests report pass if no devices are available.

If that were really expected, I would expect it to be consistent. In that OCL build, 7 gpu tests passed, but 156 failed. I don't understand why those seven ought to be different.

@vpirogov
Copy link
Member

Good point. I missed the fact that some tests still fail. Let me try to reproduce it.

Do you see anything useful in failed tests output?

@densamoilov
Copy link
Contributor

@nwnk,

If that were really expected

It is really expected for the examples but not for the tests (gtests and benchdnn) because we only include examples in the binary releases (read oneAPI releases) so we don't want them to fail on systems that don't have GPUs.

There is one example that fails (gpu_opencl_interop) but it's a bug in the error handling mechanism.

@vpirogov vpirogov added bug A confirmed library bug help wanted and removed question labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A confirmed library bug help wanted platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

No branches or pull requests

4 participants