Skip to content

[SYCL][CUDA][HIP] Add support for sycl::aspect::ext_intel_pci_address #9624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 30, 2023

Conversation

al42and
Copy link
Contributor

@al42and al42and commented May 26, 2023

Using (cu|hip)DeviceGetPCIBusId API call, present since at least CUDA 8.0 / ROCm 2.8.

Unlike L0, we do not require setting SYCL_ENABLE_PCI=1 environment variable.

@al42and al42and requested review from a team as code owners May 26, 2023 15:32
@al42and al42and requested a review from steffenlarsen May 26, 2023 15:32
@al42and
Copy link
Contributor Author

al42and commented May 26, 2023

$ cat test_pci.cpp 
#include <iostream>
#include <sycl/sycl.hpp>

int main() {
  auto devices = sycl::device::get_devices(sycl::info::device_type::gpu);

  for (const auto &device : devices) {
    try {
      std::cout << "Device Name: "
                << device.get_info<sycl::info::device::name>() << std::endl;
      std::cout << "Device has ext_intel_pci_address: "
                << device.has(sycl::aspect::ext_intel_pci_address) << std::endl;
      std::cout
          << "PCI Bus ID: "
          << device.get_info<sycl::ext::intel::info::device::pci_address>()
          << std::endl;
    } catch (sycl::exception &e) {
      std::cerr << "SYCL exception caught: " << e.what() << std::endl;
    }
    std::cout << std::endl;
  }

  return 0;
}

$ clang++ -fsycl test_pci.cpp -o test_pci && SYCL_ENABLE_PCI=1  ./test_pci
Device Name: Intel(R) Arc(TM) A770 Graphics
Device has ext_intel_pci_address: 0
PCI Bus ID: SYCL exception caught: Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)

Device Name: Intel(R) UHD Graphics 770
Device has ext_intel_pci_address: 0
PCI Bus ID: SYCL exception caught: Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)

Device Name: Intel(R) Arc(TM) A770 Graphics
Device has ext_intel_pci_address: 1
PCI Bus ID: 0000:0b:00.0

Device Name: Intel(R) UHD Graphics 770
Device has ext_intel_pci_address: 1
PCI Bus ID: 0000:00:02.0

Device Name: NVIDIA GeForce RTX 3060
Device has ext_intel_pci_address: 1
PCI Bus ID: 0000:01:00.0

Device Name: AMD Radeon RX 6400
Device has ext_intel_pci_address: 1
PCI Bus ID: 0000:07:00.0

@al42and al42and temporarily deployed to aws May 26, 2023 16:04 — with GitHub Actions Inactive
@al42and al42and temporarily deployed to aws May 26, 2023 17:32 — with GitHub Actions Inactive
@jinz2014
Copy link
Contributor

Could you please explain the values "0" and "1" for the same device ? Thanks.

Device Name: Intel(R) Arc(TM) A770 Graphics
Device has ext_intel_pci_address: 0
PCI Bus ID: SYCL exception caught: Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)

Device Name: Intel(R) Arc(TM) A770 Graphics
Device has ext_intel_pci_address: 1
PCI Bus ID: 0000:0b:00.0

@al42and
Copy link
Contributor Author

al42and commented May 27, 2023

Could you please explain the values "0" and "1" for the same device ? Thanks.

The first ones (without the aspect) are from the OpenCL backend; the second ones (with the aspect) are from LevelZero. That's a known limitation of the OpenCL backend (and, until this PR, of CUDA and HIP too).

The Intel devices are not directly related to this MR, but they demonstrate that the output format is consistent.

Copy link
Contributor

@jchlanda jchlanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to convert your test_pci.cpp sample into an e2e test?

@al42and
Copy link
Contributor Author

al42and commented May 29, 2023

Would you be able to convert your test_pci.cpp sample into an e2e test?

There is https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/intel-ext-device.cpp. I'm not sure how it's output is verified in CI, but it works ok for me on NVIDIA and AMD devices:

Platform #7:
Device #1: NVIDIA GeForce RTX 3060:
Backend: CUDA
PCI address = 0000:01:00.0
Device UUID = 115971881914139240342461923712712616593242
Device ID = 0

Platform #8:
Device #1: AMD Radeon RX 6400:
Backend: Unknown
PCI address = 0000:07:00.0
Device UUID = 888800000000000000
Device ID = 0

(The UUID for the HIP device is suspicious, should I open another issue? And it fails the "Expect total_memory >= free_memory" assertion for Arc A770, since total memory is 0, but free memory is non-zero. But the PCIE addresses are reported fine).

@jchlanda
Copy link
Contributor

Would you be able to convert your test_pci.cpp sample into an e2e test?

There is https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/intel-ext-device.cpp. I'm not sure how it's output is verified in CI, but it works ok for me on NVIDIA and AMD devices:

Platform #7:
Device #1: NVIDIA GeForce RTX 3060:
Backend: CUDA
PCI address = 0000:01:00.0
Device UUID = 115971881914139240342461923712712616593242
Device ID = 0

Platform #8:
Device #1: AMD Radeon RX 6400:
Backend: Unknown
PCI address = 0000:07:00.0
Device UUID = 888800000000000000
Device ID = 0

(The UUID for the HIP device is suspicious, should I open another issue? And it fails the "Expect total_memory >= free_memory" assertion for Arc A770, since total memory is 0, but free memory is non-zero. But the PCIE addresses are reported fine).

Sorry, wasn't aware of that test.

Opening a ticket wouldn't hurt, thanks.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@al42and al42and temporarily deployed to aws May 30, 2023 13:42 — with GitHub Actions Inactive
@al42and al42and temporarily deployed to aws May 30, 2023 14:22 — with GitHub Actions Inactive
@steffenlarsen steffenlarsen merged commit 4bf5423 into intel:sycl May 30, 2023
@al42and al42and deleted the cuda-hip-pcie-address branch May 30, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants