Conversation
- load_optimized() now catches device='cuda' failures on ROCm systems where PyTorch-ROCm is not installed, degrades to CPU gracefully - deploy.sh removes CPU-only onnxruntime before installing onnxruntime-rocm to prevent the shadowing bug
- _try_rocm() checks torch.cuda.is_available() before setting device='cuda' If PyTorch-ROCm is not installed, device stays 'cpu' from the start - load_optimized() fallback pre-checks torch.cuda instead of catching NVIDIA driver exceptions reactively (cleaner logs, no crash) - Added test: no-PyTorch-ROCm falls back to cpu device (15 tests total)
Root cause: ultralytics AutoUpdate detects onnx/onnxslim/onnxruntime as missing during ONNX export and auto-installs CPU onnxruntime, re-shadowing onnxruntime-rocm. Three-layer defense: - requirements_rocm.txt: pre-install onnx + onnxslim so ultralytics doesn't trigger AutoUpdate for ONNX export deps - deploy.sh: set YOLO_AUTOINSTALL=0 during export step - deploy.sh: post-export cleanup removes CPU onnxruntime if present
Instead of installing wrong packages then cleaning up: - Phase 1: PyTorch from ROCm --index-url (forces ROCm build, not CUDA) - Phase 2: remaining packages incl. onnxruntime-rocm, onnx, onnxslim - YOLO_AUTOINSTALL=0 prevents ultralytics from auto-installing CPU onnxruntime Removed: pre-install onnxruntime cleanup, post-export onnxruntime cleanup (no longer needed when packages are installed correctly)
deploy.sh now reads ROCm version from /opt/rocm/.info/version, amd-smi, or rocminfo and constructs the PyTorch index URL dynamically (e.g. rocm7.2 instead of hardcoded rocm6.2). Falls back to 6.2 only if version detection fails.
PyTorch only publishes wheels for specific ROCm versions (e.g. 6.2, 7.0, 7.1) — not every point release. For ROCm 7.2, deploy now tries: 7.2 → 7.1 → 7.0 → 6.4 → 6.3 → 6.2 → 6.1 → 6.0 Stops at first successful install. Falls back to PyPI CPU torch if no ROCm wheels found at all.
Ultralytics' ONNX loader only supports CUDAExecutionProvider (NVIDIA). On ROCm, it falls back to CPU even though ROCMExecutionProvider is available. PyTorch + HIP runs natively on AMD GPUs via device='cuda'. - Change ROCm BackendSpec: onnx → pytorch (skip ONNX export entirely) - Set YOLO_AUTOINSTALL=0 in detect.py to prevent ultralytics from auto-installing onnxruntime-gpu (NVIDIA) at runtime
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.