Skip to content

Memory access fault by GPU node-1 ... Reason: Page not present or supervisor privilege. #10460

Open
@bader

Description

@bader

Describe the bug
From the pre-commit for a non-functional change: https://github.com/intel/llvm/actions/runs/5592747110/jobs/10226580480

FAIL: SYCL :: Basic/span.cpp (243 of 1446)
******************** TEST 'SYCL :: Basic/span.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';    /__w/llvm/llvm/toolchain/bin//clang++  -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1031 -fsycl -fsycl-targets=amdgcn-amd-amdhsa /__w/llvm/llvm/llvm/sycl/test-e2e/Basic/span.cpp -o /__w/llvm/llvm/build-e2e/Basic/Output/span.cpp.tmp.out
: 'RUN: at line 2';   env ONEAPI_DEVICE_SELECTOR=ext_oneapi_hip:gpu  /__w/llvm/llvm/build-e2e/Basic/Output/span.cpp.tmp.out
--
Exit Code: -6

Command Output (stdout):
--
$ ":" "RUN: at line 1"
note: command had no output on stdout or stderr
$ "/__w/llvm/llvm/toolchain/bin//clang++" "-Xsycl-target-backend=amdgcn-amd-amdhsa" "--offload-arch=gfx1031" "-fsycl" "-fsycl-targets=amdgcn-amd-amdhsa" "/__w/llvm/llvm/llvm/sycl/test-e2e/Basic/span.cpp" "-o" "/__w/llvm/llvm/build-e2e/Basic/Output/span.cpp.tmp.out"
note: command had no output on stdout or stderr
$ ":" "RUN: at line 2"
note: command had no output on stdout or stderr
$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_hip:gpu" "/__w/llvm/llvm/build-e2e/Basic/Output/span.cpp.tmp.out"
# command stderr:
Memory access fault by GPU node-1 (Agent handle: 0x85f9e0) on address 0x7f5be2522000. Reason: Page not present or supervisor privilege.

error: command failed with exit status: -6

--

To Reproduce
Unfortunately, I have no idea how to reproduce this. Get access to CI machine and run pre-commit checks until it happens again?

Environment (please complete the following information):

[ext_oneapi_hip:gpu:0] AMD HIP BACKEND, AMD Radeon RX 6700 XT gfx1031 [HIP 50631.6]

Platforms: 1
Platform [#1]:
    Version  : HIP 50631.6
    Name     : AMD HIP BACKEND
    Vendor   : AMD Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : gfx1031
        Name       : AMD Radeon RX 6700 XT
        Vendor     : AMD Corporation
        Driver     : HIP 50631.6
        Aspects    : gpu fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations ext_intel_pci_address usm_atomic_host_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image
        info::device::sub_group_sizes: 32
default_selector()      : gpu, AMD HIP BACKEND, AMD Radeon RX 6700 XT gfx1031 [HIP 50631.6]
accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
gpu_selector()          : gpu, AMD HIP BACKEND, AMD Radeon RX 6700 XT gfx1031 [HIP 50631.6]
custom_selector(gpu)    : gpu, AMD HIP BACKEND, AMD Radeon RX 6700 XT gfx1031 [HIP 50631.6]
custom_selector(cpu)    : No device of requested type available. -1 (PI_ERRO...
custom_selector(acc)    : No device of requested type available. -1 (PI_ERRO...

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghipIssues related to execution on HIP backend.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions