Skip to content

Feature/rocm gpu detection#140

Merged
solderzzc merged 7 commits intodevelopfrom
feature/rocm-gpu-detection
Mar 9, 2026
Merged

Feature/rocm gpu detection#140
solderzzc merged 7 commits intodevelopfrom
feature/rocm-gpu-detection

Conversation

@solderzzc
Copy link
Member

No description provided.

- load_optimized() now catches device='cuda' failures on ROCm systems
  where PyTorch-ROCm is not installed, degrades to CPU gracefully
- deploy.sh removes CPU-only onnxruntime before installing onnxruntime-rocm
  to prevent the shadowing bug
- _try_rocm() checks torch.cuda.is_available() before setting device='cuda'
  If PyTorch-ROCm is not installed, device stays 'cpu' from the start
- load_optimized() fallback pre-checks torch.cuda instead of catching
  NVIDIA driver exceptions reactively (cleaner logs, no crash)
- Added test: no-PyTorch-ROCm falls back to cpu device (15 tests total)
Root cause: ultralytics AutoUpdate detects onnx/onnxslim/onnxruntime
as missing during ONNX export and auto-installs CPU onnxruntime,
re-shadowing onnxruntime-rocm.

Three-layer defense:
- requirements_rocm.txt: pre-install onnx + onnxslim so ultralytics
  doesn't trigger AutoUpdate for ONNX export deps
- deploy.sh: set YOLO_AUTOINSTALL=0 during export step
- deploy.sh: post-export cleanup removes CPU onnxruntime if present
Instead of installing wrong packages then cleaning up:
- Phase 1: PyTorch from ROCm --index-url (forces ROCm build, not CUDA)
- Phase 2: remaining packages incl. onnxruntime-rocm, onnx, onnxslim
- YOLO_AUTOINSTALL=0 prevents ultralytics from auto-installing CPU onnxruntime

Removed: pre-install onnxruntime cleanup, post-export onnxruntime cleanup
(no longer needed when packages are installed correctly)
deploy.sh now reads ROCm version from /opt/rocm/.info/version,
amd-smi, or rocminfo and constructs the PyTorch index URL dynamically
(e.g. rocm7.2 instead of hardcoded rocm6.2). Falls back to 6.2 only
if version detection fails.
PyTorch only publishes wheels for specific ROCm versions (e.g. 6.2,
7.0, 7.1) — not every point release. For ROCm 7.2, deploy now tries:
7.2 → 7.1 → 7.0 → 6.4 → 6.3 → 6.2 → 6.1 → 6.0
Stops at first successful install. Falls back to PyPI CPU torch if
no ROCm wheels found at all.
Ultralytics' ONNX loader only supports CUDAExecutionProvider (NVIDIA).
On ROCm, it falls back to CPU even though ROCMExecutionProvider is
available. PyTorch + HIP runs natively on AMD GPUs via device='cuda'.

- Change ROCm BackendSpec: onnx → pytorch (skip ONNX export entirely)
- Set YOLO_AUTOINSTALL=0 in detect.py to prevent ultralytics from
  auto-installing onnxruntime-gpu (NVIDIA) at runtime
@solderzzc solderzzc merged commit 385e692 into develop Mar 9, 2026
1 check passed
@solderzzc solderzzc deleted the feature/rocm-gpu-detection branch March 9, 2026 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant