Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No XPU devices found #712

Closed
Bhargav230m opened this issue Sep 20, 2024 · 6 comments
Closed

No XPU devices found #712

Bhargav230m opened this issue Sep 20, 2024 · 6 comments
Assignees
Labels
Crash Execution crashes iGPU

Comments

@Bhargav230m
Copy link

Describe the bug

import torch
import intel_extension_for_pytorch as ipex

device = "xpu:0"
tensor = torch.randn(3, 3).to(device)

print(tensor)
print(f"Tensor on device: {tensor.device}")

Error:

(ml) techpowerb@ruby:~$ python test_xpu.py
/home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/intel_extension_for_pytorch/xpu/lazy_init.py:80: UserWarning: XPU Device count is zero! (Triggered internally at /build/intel-pytorch-extension/csrc/gpu/runtime/Device.cpp:127.)
  _C._initExtension()
terminate called after throwing an instance of 'c10::Error'
  what():  dpcppSetDevice: device_id is out of range
Exception raised from dpcppSetDevice at /build/intel-pytorch-extension/csrc/gpu/runtime/Device.cpp:167 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x99 (0x7f5f75ea4a89 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x6a (0x7f5f75e5e2e8 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: xpu::dpcpp::dpcppSetDevice(signed char) + 0x114 (0x7f5ecc24f674 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so)
frame #3: xpu::dpcpp::set_device(signed char) + 0x20 (0x7f5ecc1c46d0 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so)
frame #4: xpu::dpcpp::impl::DPCPPGuardImpl::uncheckedSetDevice(c10::Device) const + 0xd (0x7f5ecc1c856d in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so)
frame #5: at::AtenIpexTypeXPU::resize_impl(c10::TensorImpl*, c10::ArrayRef<long>, c10::optional<c10::ArrayRef<long> >, bool) + 0xc2f (0x7f5ecc24041f in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so)
frame #6: at::AtenIpexTypeXPU::impl::empty_strided_dpcpp(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) + 0xc6 (0x7f5ed7e336d6 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so)
frame #7: at::AtenIpexTypeXPU::empty_strided(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0xec (0x7f5ed7e412fc in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so)
frame #8: <unknown function> + 0x3c475cc (0x7f5ecc2c15cc in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so)
frame #9: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0xf8 (0x7f5f78107e58 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x255e1ed (0x7f5f784551ed in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #11: at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0x1a6 (0x7f5f7814fd46 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x1768fc0 (0x7f5f7765ffc0 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::native::_to_copy(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x1484 (0x7f5f77967c34 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x26f2cdd (0x7f5f785e9cdd in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #15: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0xf8 (0x7f5f77ddccb8 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0x255b501 (0x7f5f78452501 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #17: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0xf8 (0x7f5f77ddccb8 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #18: <unknown function> + 0x3a4be35 (0x7f5f79942e35 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #19: <unknown function> + 0x3a4c350 (0x7f5f79943350 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #20: at::_ops::_to_copy::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x1e5 (0x7f5f77e7c855 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #21: at::native::to(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x104 (0x7f5f7795c0e4 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #22: <unknown function> + 0x2876243 (0x7f5f7876d243 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #23: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x1fa (0x7f5f77ff39ea in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #24: <unknown function> + 0x3ddac9 (0x7f5f8a8c1ac9 in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #25: <unknown function> + 0x3fc81c (0x7f5f8a8e081c in /home/techpowerb/miniconda3/envs/ml/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #26: python() [0x4f9b46]
<omitting python frames>
frame #28: python() [0x4e69da]
frame #32: python() [0x5c1157]
frame #33: python() [0x5bd170]
frame #34: python() [0x456423]
frame #38: <unknown function> + 0x29d90 (0x7f5f8bb20d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: __libc_start_main + 0x80 (0x7f5f8bb20e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #40: python() [0x58784e]

Aborted

Why is it returning no XPU devices? I have Iris Xe Graphics with a CPU i5 1135G7

I have followed all the installation steps here: https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.1.40%2bxpu&os=linux%2fwsl2&package=pip

Versions

(ml) techpowerb@ruby:~$ python collect_env.py
Collecting environment information...
PyTorch version: 2.1.0.post3+cxx11.abi
PyTorch CXX11 ABI: Yes
IPEX version: 2.1.40+xpu
IPEX commit: 80ed476
Build type: Release

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: N/A
IGC version: 2024.2.1 (2024.2.1.20240711)
CMake version: N/A
Libc version: glibc-2.35

Python version: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is XPU available: False
DPCPP runtime version: 2024.2
MKL version: 2024.2
GPU models and configuration:

Intel OpenCL ICD version: 23.17.26241.33-64722.04
Level Zero version: 1.3.26241.33-647
22.04

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
CPU family: 6
Model: 140
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 1
BogoMIPS: 4838.39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear flush_l1d arch_capabilities
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 192 KiB (4 instances)
L1i cache: 128 KiB (4 instances)
L2 cache: 5 MiB (4 instances)
L3 cache: 8 MiB (1 instance)
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] intel_extension_for_pytorch==2.1.40+xpu
[pip3] numpy==1.26.4
[pip3] torch==2.1.0.post3+cxx11.abi
[pip3] torchaudio==2.1.0.post3+cxx11.abi
[pip3] torchvision==0.16.0.post3+cxx11.abi
[conda] intel-extension-for-pytorch 2.1.40+xpu pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.1.0.post3+cxx11.abi pypi_0 pypi
[conda] torchaudio 2.1.0.post3+cxx11.abi pypi_0 pypi
[conda] torchvision 0.16.0.post3+cxx11.abi pypi_0 pypi``

@Bhargav230m
Copy link
Author

I fixed this issue by switching to Windows.
Before this, I tried it on WSL2 and Open Suse Tumbleweed

But it doesn't work properly, The code below:

import torch
import intel_extension_for_pytorch as ipex

device = "xpu"

tensor = torch.randn(3, 3)
tensor = tensor.to(device)

print(tensor)
print(f"Tensor on device: {tensor.device}")

The error thrown is:

Traceback (most recent call last):
  File "C:\Users\techn\OneDrive\Desktop\maria\test.py", line 9, in <module>
    print(tensor)
  File "C:\Users\techn\miniconda3\envs\ml-xpu\lib\site-packages\torch\_tensor.py", line 431, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "C:\Users\techn\miniconda3\envs\ml-xpu\lib\site-packages\torch\_tensor_str.py", line 664, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "C:\Users\techn\miniconda3\envs\ml-xpu\lib\site-packages\torch\_tensor_str.py", line 595, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "C:\Users\techn\miniconda3\envs\ml-xpu\lib\site-packages\torch\_tensor_str.py", line 347, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "C:\Users\techn\miniconda3\envs\ml-xpu\lib\site-packages\torch\_tensor_str.py", line 138, in __init__
    tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0)
RuntimeError: The program was built for 1 devices
Build program log for 'Intel(R) Iris(R) Xe Graphics':
 -11 (PI_ERROR_BUILD_PROGRAM_FAILURE)

I think it is able to move the tensor to XPU but fails when I try to retrieve it. Hope anyone helps ASAP with this>

@Bhargav230m
Copy link
Author

I also tried training a dummy linear model and I get the same error:

import torch
import intel_extension_for_pytorch as ipex
import torch.nn as nn
import torch.optim as optim

class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.linear = nn.Linear(100, 500)

    def forward(self, x):
        return self.linear(x)

model = SimpleModel().to("xpu:0")

criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

input_data = torch.randn(64, 100).to("xpu:0")
target_data = torch.randn(64, 500).to("xpu:0") 

for epoch in range(10):
    model.train()

    outputs = model(input_data)
    loss = criterion(outputs, target_data)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    print(f'Epoch [{epoch+1}/10], Loss: {loss.item():.4f}')
    ```

@alexsin368
Copy link

alexsin368 commented Sep 24, 2024

@Bhargav230m unfortunately, IPEX does not support Iris Xe Graphics. The only consumer graphics card supported is Arc:
image

@alexsin368 alexsin368 self-assigned this Sep 24, 2024
@Bhargav230m
Copy link
Author

@alexsin368 I wana train a model and OpenVINO is mainly for inference

@alexsin368
Copy link

alexsin368 commented Sep 24, 2024

@Bhargav230m if you would like to train a model using Intel hardware, I recommend going to the Intel® Tiber™ Developer Cloud at cloud.intel.com to get access to our data center GPUs and Gaudi AI accelerators.

@jingxu10 jingxu10 added iGPU Crash Execution crashes labels Oct 15, 2024
@alexsin368
Copy link

No actions taken or needed at this time, closing ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Crash Execution crashes iGPU
Projects
None yet
Development

No branches or pull requests

3 participants