-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new prop to _XpuDevicePropertie for triton gemm optimization #131738
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131738
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit bab9db0 with merge base 8ea5b57 (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
ghstack-source-id: 00b182800df11861a4c413063e910597650266b3 Pull Request resolved: #131738
c10/xpu/XPUFunctions.cpp
Outdated
device_prop->has_##member = \ | ||
raw_device.ext_oneapi_supports_cl_extension("cl_intel_" #member); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device_prop->has_##member = \ | |
raw_device.ext_oneapi_supports_cl_extension("cl_intel_" #member); | |
sycl::ext::oneapi::experimental::cl_version version_##member; \ | |
device_prop->has_##member = raw_device.ext_oneapi_supports_cl_extension( \ | |
"cl_intel_" #member, &version_##member); |
The compilation does not fail, however I'm getting core dump without second argument (PTDB 0.5.3.27, python 3.9):
$ python -c "import torch; print(torch.xpu.get_device_capability())"
[1] 130676 segmentation fault (core dumped) python -c "import torch; print(torch.xpu.get_device_capability())"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the backtrace of the problem:
Thread 1.1 "pt_main_thread" received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0 0x00007fffd185d78d in sycl::_V1::ext::oneapi::experimental::detail::OpenCLC_Supports_Extension(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, sycl::_V1::ext::oneapi::experimental::cl_version*, unsigned int) () from /opt/intel/oneapi/pytorch-gpu-dev-0.5/lib/libsycl.so.7
#1 0x00007fffd1946fd7 in sycl::_V1::device::ext_oneapi_supports_cl_extension(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, sycl::_V1::ext::oneapi::experimental::cl_version*) const () from /opt/intel/oneapi/pytorch-gpu-dev-0.5/lib/libsycl.so.7
#2 0x00007ffff72f5bdd in c10::xpu::(anonymous namespace)::initDeviceProperties (device_prop=0x9c476d0, device=0) at /home/yevhenii/Projects/pytorch/c10/xpu/XPUFunctions.cpp:115
#3 0x00007ffff72f609b in c10::xpu::get_device_properties (device_prop=0x9c476d0, device=0 '\000') at /home/yevhenii/Projects/pytorch/c10/xpu/XPUFunctions.cpp:158
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. I will check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZzEeKkAa CI is green. CI use the following version, could you try again on your local machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using PTDB 0.5.2.18 with updated ubuntu 22.04 and python 3.9 venv installed from deadsnakes/ppa. Just reproduced the issue.
icpx -v
Intel(R) oneAPI DPC++/C++ Compiler 2024.1.3 (2024.1.3.20240604)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2024.1/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2024.1/bin/compiler/../icpx.cfg
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. The compiler team has confirmed this is a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge this PR once our CI upgrades to the appropriate compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem!
ghstack-source-id: 2fa92b8730025188ec146931a4c32d2e5d66c51e Pull Request resolved: #131738
Update property names so they are similar to pytorch/pytorch#131738 both in IPEX and here Targets: intel/intel-xpu-backend-for-triton#1787
ghstack-source-id: 4e257cbcc75b2ba7dcec6e4d09d0b7840d8f383c Pull Request resolved: #131738
``` Fallback to CPU for XPU FP64 - pytorch/pytorch#126516 Elapsed time - pytorch/pytorch#126456 Device properties - pytorch/pytorch#131738 ``` Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
ghstack-source-id: 8073b86deb4e24aab8387852b3506fffbb9d2b5c Pull Request resolved: #131738
@pytorchbot rebase -b main |
@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here |
Successfully rebased |
ghstack-source-id: 0edf71430016cad5e7bf0443502eb9ab5b2d9b5c Pull Request resolved: #131738
@pytorchbot merge |
@ZzEeKkAa bundle 0.5.3 has been released. Let us merge this PR. |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…orch#131738) # Motivation This PR aims to add new properties to `_XpuDevicePropertie` for triton gemm optimization. # Additional Context `ext_oneapi_supports_cl_extension` is not a ABI-neutral API. It depends on compiler 2025.0. For more details, see intel/llvm#13212 Pull Request resolved: pytorch#131738 Approved by: https://github.com/gujinghui
Stack from ghstack (oldest at bottom):
Motivation
This PR aims to add new properties to
_XpuDevicePropertie
for triton gemm optimization.Additional Context
ext_oneapi_supports_cl_extension
is not a ABI-neutral API. It depends on compiler 2025.0. For more details, see intel/llvm#13212cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10