Skip to content

[CUDA] Test 'Assert/assert_in_multiple_tus.cpp' CI Failure #8832

Open
@andylshort

Description

@andylshort

Describe the bug
Post-commit CI CUDA E2E test Assert/assert_in_multiple_tus.cpp fails:

FAIL: SYCL :: Assert/assert_in_multiple_tus.cpp (20 of 1270)
******************** TEST 'SYCL :: Assert/assert_in_multiple_tus.cpp' FAILED ********************
Script:
--
: 'RUN: at line 6';    /__w/llvm/llvm/toolchain/bin/clang++   -DSYCL_FALLBACK_ASSERT=1 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -I /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/Inputs /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/Inputs/kernels_in_file2.cpp -o /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.out
: 'RUN: at line 7';   true /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.out &> /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.cpu.txt || true
: 'RUN: at line 8';   true FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp --input-file /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.cpu.txt
: 'RUN: at line 9';    env ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1  /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.out &> /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.gpu.txt || true
: 'RUN: at line 10';    env ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1  FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp --input-file /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.gpu.txt
: 'RUN: at line 12';   true /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.out &> /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.acc.txt
: 'RUN: at line 13';   true FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp --check-prefix=CHECK-ACC --input-file /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.acc.txt
--
Exit Code: 1

Command Output (stdout):
--
$ ":" "RUN: at line 6"
note: command had no output on stdout or stderr
$ "/__w/llvm/llvm/toolchain/bin/clang++" "-DSYCL_FALLBACK_ASSERT=1" "-fsycl" "-fsycl-targets=nvptx64-nvidia-cuda" "-I" "/__w/llvm/llvm/llvm/sycl/test-e2e/Assert/Inputs" "/__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp" "/__w/llvm/llvm/llvm/sycl/test-e2e/Assert/Inputs/kernels_in_file2.cpp" "-o" "/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.out"
# command stderr:
clang++: warning: CUDA version 11.7 is only partially supported [-Wunknown-cuda-version]

$ ":" "RUN: at line 7"
note: command had no output on stdout or stderr
$ "true" "/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.out"
note: command had no output on stdout or stderr
$ ":" "RUN: at line 8"
note: command had no output on stdout or stderr
$ "true" "FileCheck" "/__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp" "--input-file" "/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.cpu.txt"
note: command had no output on stdout or stderr
$ ":" "RUN: at line 9"
note: command had no output on stdout or stderr
$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu" "SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1" "/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.out"
# redirected output from '/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.gpu.txt':

PI CUDA ERROR:
	Value:           710
	Name:            CUDA_ERROR_ASSERT
	Description:     device-side assert triggered
	Function:        build_program
	Source Location: /__w/llvm/llvm/src/sycl/plugins/cuda/pi_cuda.cpp:776


PI CUDA ERROR:
	Value:           400
	Name:            CUDA_ERROR_INVALID_HANDLE
	Description:     invalid resource handle
	Function:        cuda_piProgramRelease
	Source Location: /__w/llvm/llvm/src/sycl/plugins/cuda/pi_cuda.cpp:3600

terminate called after throwing an instance of 'sycl::_V1::compile_program_error'
  what():  The program was built for 1 devices
Build program log for 'NVIDIA A10G':
 -999 (Unknown PI error)

note: command had no output on stdout or stderr
error: command failed with exit status: -6
$ "true"
note: command had no output on stdout or stderr
$ ":" "RUN: at line 10"
note: command had no output on stdout or stderr
$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu" "SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1" "FileCheck" "/__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp" "--input-file" "/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.gpu.txt"
# command stderr:
/__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp:17:11: error: CHECK: expected string not found in input
// CHECK: {{.*}}kernels_in_file2.cpp:15: int calculus(int): {{global id: \[5|block: \[1}},0,0], {{local id|thread}}: [1,0,0]
          ^
/__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.gpu.txt:1:1: note: scanning from here

^

Input file: /__w/llvm/llvm/build-e2e/Assert/Output/assert_in_multiple_tus.cpp.tmp.gpu.txt
Check file: /__w/llvm/llvm/llvm/sycl/test-e2e/Assert/assert_in_multiple_tus.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1:  
check:17     X error: no match found
          2: PI CUDA ERROR: 
check:17     ~~~~~~~~~~~~~~~
          3:  Value: 710 
check:17     ~~~~~~~~~~~~
          4:  Name: CUDA_ERROR_ASSERT 
check:17     ~~~~~~~~~~~~~~~~~~~~~~~~~
          5:  Description: device-side assert triggered 
check:17     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          6:  Function: build_program 
check:17     ~~~~~~~~~~~~~~~~~~~~~~~~~
          .
          .
          .
>>>>>>

error: command failed with exit status: 1

To Reproduce
Run any pre-merge checks on current PRs. It fails in one of my PRs, run history and log files available here: https://github.com/intel/llvm/actions/runs/4542034482/jobs/8005455354?pr=8825

Environment (please complete the following information):

  • OS: Linux
  • Target device and vendor: AWS Node
  • DPC++ version: Latest

Additional context
N/A.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions