GPU tests pass when they probably shouldn't #1961

nwnk · 2024-06-13T21:39:24Z

Using the oneAPI 2024.1 release, build the SYCL CPU and GPU backends. Ensure that no SYCL devices are available on the system. Then run ctest:

% export OCL_ICD_VENDORS=/dev/null
% sudo dnf -y remove oneapi-level-zero >& /dev/null
% sycl-ls | wc -l
0
% ctest >& test-broken-gpu.log
% grep gpu.*Passed test-broken-gpu.log
  2/453 Test   #2: gpu-bnorm-u8-via-binary-postops-cpp ......................   Passed    0.02 sec
  4/453 Test   #4: gpu-cnn-inference-f32-cpp ................................   Passed    0.02 sec
  6/453 Test   #6: gpu-cnn-inference-int8-cpp ...............................   Passed    0.02 sec
  8/453 Test   #8: gpu-cnn-training-bf16-cpp ................................   Passed    0.02 sec
 10/453 Test  #10: gpu-cnn-training-f32-cpp .................................   Passed    0.02 sec
 13/453 Test  #13: gpu-getting-started-cpp ..................................   Passed    0.02 sec
 15/453 Test  #15: gpu-graph-sycl-getting-started-cpp .......................   Passed    0.02 sec
 17/453 Test  #17: gpu-graph-sycl-single-op-partition-cpp ...................   Passed    0.02 sec
 19/453 Test  #19: gpu-matmul-perf-cpp ......................................   Passed    0.02 sec
 21/453 Test  #21: gpu-memory-format-propagation-cpp ........................   Passed    0.02 sec
 23/453 Test  #23: gpu-performance-profiling-cpp ............................   Passed    0.02 sec
 25/453 Test  #25: gpu-primitives-augru-cpp .................................   Passed    0.02 sec
 27/453 Test  #27: gpu-primitives-batch-normalization-cpp ...................   Passed    0.02 sec
 29/453 Test  #29: gpu-primitives-binary-cpp ................................   Passed    0.02 sec
 31/453 Test  #31: gpu-primitives-concat-cpp ................................   Passed    0.02 sec
 33/453 Test  #33: gpu-primitives-convolution-cpp ...........................   Passed    0.02 sec
 35/453 Test  #35: gpu-primitives-eltwise-cpp ...............................   Passed    0.02 sec
 37/453 Test  #37: gpu-primitives-group-normalization-cpp ...................   Passed    0.02 sec
 39/453 Test  #39: gpu-primitives-inner-product-cpp .........................   Passed    0.02 sec
 41/453 Test  #41: gpu-primitives-layer-normalization-cpp ...................   Passed    0.02 sec
 43/453 Test  #43: gpu-primitives-lbr-gru-cpp ...............................   Passed    0.02 sec
 45/453 Test  #45: gpu-primitives-lrn-cpp ...................................   Passed    0.02 sec
 47/453 Test  #47: gpu-primitives-lstm-cpp ..................................   Passed    0.02 sec
 49/453 Test  #49: gpu-primitives-matmul-cpp ................................   Passed    0.02 sec
 51/453 Test  #51: gpu-primitives-pooling-cpp ...............................   Passed    0.02 sec
 53/453 Test  #53: gpu-primitives-prelu-cpp .................................   Passed    0.02 sec
 55/453 Test  #55: gpu-primitives-reduction-cpp .............................   Passed    0.02 sec
 57/453 Test  #57: gpu-primitives-reorder-cpp ...............................   Passed    0.02 sec
 59/453 Test  #59: gpu-primitives-resampling-cpp ............................   Passed    0.02 sec
 61/453 Test  #61: gpu-primitives-shuffle-cpp ...............................   Passed    0.02 sec
 63/453 Test  #63: gpu-primitives-softmax-cpp ...............................   Passed    0.02 sec
 65/453 Test  #65: gpu-primitives-sum-cpp ...................................   Passed    0.02 sec
 67/453 Test  #67: gpu-primitives-vanilla-rnn-cpp ...........................   Passed    0.02 sec
 69/453 Test  #69: gpu-rnn-training-f32-cpp .................................   Passed    0.02 sec
 71/453 Test  #71: gpu-sycl-interop-buffer-cpp ..............................   Passed    0.02 sec
 73/453 Test  #73: gpu-sycl-interop-usm-cpp .................................   Passed    0.02 sec
 75/453 Test  #75: gpu-tutorials-matmul-inference-int8-matmul-cpp ...........   Passed    0.02 sec
 77/453 Test  #77: gpu-tutorials-matmul-weights-decompression-matmul-cpp ....   Passed    0.02 sec
233/453 Test #233: test_rnn_forward_gpu .....................................   Passed    0.02 sec
235/453 Test #235: test_rnn_forward_buffer_gpu ..............................   Passed    0.02 sec
249/453 Test #249: test_convolution_format_any_gpu ..........................   Passed    0.02 sec
251/453 Test #251: test_convolution_format_any_buffer_gpu ...................   Passed    0.02 sec
379/453 Test #379: test_graph_unit_dnnl_mqa_decomp_usm_gpu ..................   Passed    0.02 sec
391/453 Test #391: test_graph_unit_dnnl_sdp_decomp_usm_gpu ..................   Passed    0.02 sec
395/453 Test #395: test_graph_unit_dnnl_typecast_usm_gpu ....................   Passed    0.02 sec

If you try the same trick with the OCL backend, more things fail like they should, but there's still a few that xpass:

% grep gpu.*Passed test-broken-gpu.log
120/322 Test #120: test_rnn_forward_gpu .....................................   Passed    0.01 sec
121/322 Test #121: test_rnn_forward_buffer_gpu ..............................   Passed    0.01 sec
132/322 Test #132: test_convolution_format_any_gpu ..........................   Passed    0.01 sec
133/322 Test #133: test_convolution_format_any_buffer_gpu ...................   Passed    0.01 sec
247/322 Test #247: test_graph_unit_dnnl_mqa_decomp_usm_gpu ..................   Passed    0.01 sec
259/322 Test #259: test_graph_unit_dnnl_sdp_decomp_usm_gpu ..................   Passed    0.01 sec
263/322 Test #263: test_graph_unit_dnnl_typecast_usm_gpu ....................   Passed    0.01 sec

The text was updated successfully, but these errors were encountered:

vpirogov · 2024-06-20T19:05:27Z

@nwnk, this behavior is expected. As there's no guarantee that GPU is present GPU tests report pass if no devices are available.

nwnk · 2024-06-21T19:50:58Z

@nwnk, this behavior is expected. As there's no guarantee that GPU is present GPU tests report pass if no devices are available.

If that were really expected, I would expect it to be consistent. In that OCL build, 7 gpu tests passed, but 156 failed. I don't understand why those seven ought to be different.

vpirogov · 2024-06-21T21:13:20Z

Good point. I missed the fact that some tests still fail. Let me try to reproduce it.

Do you see anything useful in failed tests output?

densamoilov · 2024-07-03T06:45:20Z

@nwnk,

If that were really expected

It is really expected for the examples but not for the tests (gtests and benchdnn) because we only include examples in the binary releases (read oneAPI releases) so we don't want them to fail on systems that don't have GPUs.

There is one example that fails (gpu_opencl_interop) but it's a bug in the error handling mechanism.

nwnk added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Jun 13, 2024

shu1chen added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jun 14, 2024

vpirogov added question and removed sighting Suspicious library behavior. Should be promoted to a bug when confirmed labels Jun 20, 2024

vpirogov self-assigned this Jun 20, 2024

vpirogov assigned densamoilov Jul 3, 2024

vpirogov added bug A confirmed library bug help wanted and removed question labels Jul 16, 2024

vpirogov unassigned densamoilov and vpirogov Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU tests pass when they probably shouldn't #1961

GPU tests pass when they probably shouldn't #1961

nwnk commented Jun 13, 2024

vpirogov commented Jun 20, 2024 •

edited

Loading

nwnk commented Jun 21, 2024

vpirogov commented Jun 21, 2024

densamoilov commented Jul 3, 2024

GPU tests pass when they probably shouldn't #1961

GPU tests pass when they probably shouldn't #1961

Comments

nwnk commented Jun 13, 2024

vpirogov commented Jun 20, 2024 • edited Loading

nwnk commented Jun 21, 2024

vpirogov commented Jun 21, 2024

densamoilov commented Jul 3, 2024

vpirogov commented Jun 20, 2024 •

edited

Loading