Skip to content

Comments

Check driver version for PCI ID support#2374

Open
Menkib64 wants to merge 3 commits intoLeelaChessZero:masterfrom
Menkib64:onnx-trt-driver-version-check
Open

Check driver version for PCI ID support#2374
Menkib64 wants to merge 3 commits intoLeelaChessZero:masterfrom
Menkib64:onnx-trt-driver-version-check

Conversation

@Menkib64
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings January 11, 2026 13:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request updates the CUDA version checking logic to properly validate driver support for PCI device ID attributes. The change ensures that the PCI device ID retrieval is only attempted when the CUDA driver (not just the runtime) supports version 12.080 or later.

Changes:

  • Adds a lambda function for formatting NVIDIA version numbers
  • Moves CUDA version retrieval earlier in the initialization code and adds logging for both runtime and driver versions
  • Changes the version check for PCI ID support from runtime version to driver version

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 792 to 795
auto NVVersion = [](int v) {
return std::to_string(v / 1000) + "." + std::to_string((v % 1000) / 10) +
"." + std::to_string(v % 10);
};
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda function name NVVersion uses PascalCase, which is inconsistent with typical C++ naming conventions for local variables and lambda functions. Consider renaming it to follow lowercase naming conventions (e.g., nv_version or format_nv_version) to match the style used throughout the codebase.

Copilot uses AI. Check for mistakes.
int runtime_version;
ReportCUDAErrors(cudaRuntimeGetVersion(&runtime_version));
if (runtime_version >= 12080) {
if (driver_version >= 12080) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a check that driver_version > runtime_version earlier, which should probably be a hard error. Then just the runtime_version check here is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I remember that cuda 11+ had minor version compatibility which would allow mixing different runtime and driver versions. There are some limitations to how newer runtime works with older runtimes. It looks like it might be possible for cuda backend.

TensorRT packages have strict version dependency to cuda version. It is likely to use some features which don't work with an old driver.

https://docs.nvidia.com/deploy/cuda-compatibility/minor-version-compatibility.html

int driver_version;
ReportCUDAErrors(cudaDriverGetVersion(&driver_version));
CERR << "CUDA runtime version: " << nv_version(runtime_version);
CERR << "CUDA driver version: " << nv_version(driver_version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cuda backend says "Latest version of CUDA supported by the driver", which seems better.

Copy link
Member

@borg323 borg323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If people complain about the exception we can make it a warning and accept that cudaDeviceGetAttribute() may break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants