Environment
Device: Olares ONE
GPU: NVIDIA GeForce RTX 5090 Laptop GPU (Blackwell / sm_120), 24 GB
NVIDIA driver: 590.44.01
App: ComfyUI-Shared (comfyui 0.21.0, launcher 0.2.39); namespace comfyuisharev2server-shared
GPU stack: HAMI (hami-device-plugin, hami-scheduler, hami-core libvgpu.so dated 2026-03)
Issue 1 — HAMI vGPU memory accounting fails → false CUDA OOM during generation
KSampler aborts with: CUDA out of memory. Tried to allocate 58.00 MiB. GPU 0 has a total capacity of 23.42 GiB of which 21.49 GiB is free — an OOM with ~21 GB free. The container is allocated the full card (hami.io/vgpu-devices-allocated: ...,24463,0,0, CUDA_DEVICE_MEMORY_LIMIT_0=24463m). HAMI-core logs at startup show get_nvml_device_memory_total looping 1→15 (apparent failure to read total VRAM via NVML) followed by Kick dead proc and Fail to ... shrreg segfaults. This points to HAMI-core being unable to query device memory on this GPU/driver, corrupting its accounting.
Ruled out (no effect on the false-OOM): reinstalling torch cu128 → cu130; switching the app GPU mode between exclusive / shared / sliced; and stripping ComfyUI launch flags to a minimal --normalvram.
Issue 2 — comfy_aimdo startup segfault (server won't boot on minimal config)
On --normalvram, startup dies with: Fatal Python error: Segmentation fault at comfy_aimdo/control.py line 128 in set_log_info, called from main.py line 231; entrypoint exits 139. No CLI flag exists to disable comfy_aimdo.
Request
Could you ship/confirm a HAMI build with Blackwell (sm_120) + driver 590.x NVML support, and address the comfy_aimdo startup segfault (or add an opt-out)?
Happy to provide full logs.
Environment
Device: Olares ONE
GPU: NVIDIA GeForce RTX 5090 Laptop GPU (Blackwell / sm_120), 24 GB
NVIDIA driver: 590.44.01
App: ComfyUI-Shared (comfyui 0.21.0, launcher 0.2.39); namespace
comfyuisharev2server-sharedGPU stack: HAMI (hami-device-plugin, hami-scheduler, hami-core libvgpu.so dated 2026-03)
Issue 1 — HAMI vGPU memory accounting fails → false CUDA OOM during generation
KSampler aborts with:
CUDA out of memory. Tried to allocate 58.00 MiB. GPU 0 has a total capacity of 23.42 GiB of which 21.49 GiB is free— an OOM with ~21 GB free. The container is allocated the full card (hami.io/vgpu-devices-allocated: ...,24463,0,0,CUDA_DEVICE_MEMORY_LIMIT_0=24463m). HAMI-core logs at startup showget_nvml_device_memory_totallooping 1→15 (apparent failure to read total VRAM via NVML) followed byKick dead procandFail to ... shrregsegfaults. This points to HAMI-core being unable to query device memory on this GPU/driver, corrupting its accounting.Ruled out (no effect on the false-OOM): reinstalling torch cu128 → cu130; switching the app GPU mode between exclusive / shared / sliced; and stripping ComfyUI launch flags to a minimal
--normalvram.Issue 2 — comfy_aimdo startup segfault (server won't boot on minimal config)
On
--normalvram, startup dies with:Fatal Python error: Segmentation faultatcomfy_aimdo/control.py line 128 in set_log_info, called frommain.py line 231; entrypoint exits 139. No CLI flag exists to disable comfy_aimdo.Request
Could you ship/confirm a HAMI build with Blackwell (sm_120) + driver 590.x NVML support, and address the comfy_aimdo startup segfault (or add an opt-out)?
Happy to provide full logs.