Check driver version for PCI ID support#2374
Check driver version for PCI ID support#2374Menkib64 wants to merge 3 commits intoLeelaChessZero:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request updates the CUDA version checking logic to properly validate driver support for PCI device ID attributes. The change ensures that the PCI device ID retrieval is only attempted when the CUDA driver (not just the runtime) supports version 12.080 or later.
Changes:
- Adds a lambda function for formatting NVIDIA version numbers
- Moves CUDA version retrieval earlier in the initialization code and adds logging for both runtime and driver versions
- Changes the version check for PCI ID support from runtime version to driver version
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto NVVersion = [](int v) { | ||
| return std::to_string(v / 1000) + "." + std::to_string((v % 1000) / 10) + | ||
| "." + std::to_string(v % 10); | ||
| }; |
There was a problem hiding this comment.
The lambda function name NVVersion uses PascalCase, which is inconsistent with typical C++ naming conventions for local variables and lambda functions. Consider renaming it to follow lowercase naming conventions (e.g., nv_version or format_nv_version) to match the style used throughout the codebase.
| int runtime_version; | ||
| ReportCUDAErrors(cudaRuntimeGetVersion(&runtime_version)); | ||
| if (runtime_version >= 12080) { | ||
| if (driver_version >= 12080) { |
There was a problem hiding this comment.
I think we need a check that driver_version > runtime_version earlier, which should probably be a hard error. Then just the runtime_version check here is fine.
There was a problem hiding this comment.
Good point. I remember that cuda 11+ had minor version compatibility which would allow mixing different runtime and driver versions. There are some limitations to how newer runtime works with older runtimes. It looks like it might be possible for cuda backend.
TensorRT packages have strict version dependency to cuda version. It is likely to use some features which don't work with an old driver.
https://docs.nvidia.com/deploy/cuda-compatibility/minor-version-compatibility.html
| int driver_version; | ||
| ReportCUDAErrors(cudaDriverGetVersion(&driver_version)); | ||
| CERR << "CUDA runtime version: " << nv_version(runtime_version); | ||
| CERR << "CUDA driver version: " << nv_version(driver_version); |
There was a problem hiding this comment.
The cuda backend says "Latest version of CUDA supported by the driver", which seems better.
borg323
left a comment
There was a problem hiding this comment.
If people complain about the exception we can make it a warning and accept that cudaDeviceGetAttribute() may break.
No description provided.