Skip to content

feat: fan overhaul (sensors, curve types, GPU fan)#97

Merged
Xveyn merged 17 commits into
mainfrom
feat/fan-overhaul
May 24, 2026
Merged

feat: fan overhaul (sensors, curve types, GPU fan)#97
Xveyn merged 17 commits into
mainfrom
feat/fan-overhaul

Conversation

@Xveyn
Copy link
Copy Markdown
Owner

@Xveyn Xveyn commented May 24, 2026

Summary

Brings the Fan Editor close to the FanControl (Rem0o) feature set, adds GPU fan support, and fixes the silent CPU-sensor override that made case fans always display CPU temperature.

Backend

  • Unified TempSourceRegistry resolving namespaced sensor IDs (hwmon:, gpu:, disk:, mix:)
  • Custom sensor labels (DB-persisted) + composite sensors (max/min/avg of N) with cycle detection
  • Five curve types via fan_curve_eval: graph, flat, target, mix, sync
  • Advanced tuning per fan: start_pwm_percent, stop_below_temp_celsius, response_time_seconds (EMA), pwm_steps
  • AMD GPU fan recognition (amdgpu/nouveau hwmon tag) + EINVAL diagnostic captured in last_write_error
  • Opt-in AMD GPU manual-mode unlock (power_dpm_force_performance_level=manual, pwm1_enable=1) with restore on disable
  • Silent "auto-correct to CPU sensor" branch removed from _load_fan_configs — user-chosen sensors now survive restarts
  • Dev backend gains a simulated AMD GPU fan + GPU temperature channel

Frontend

  • New SensorsPanel on /fans: inline rename, kind badges (CPU/GPU/Disk/Mix), composite-sensor modal
  • CurveTypeSelector + 5 typed editors (Flat, Target, Mix, Sync, plus existing Graph)
  • AdvancedFanSettings collapsible (start PWM, stop-below-temp, response time, PWM steps)
  • GpuManualModeToggle (AMD GPU only) with warning banner
  • FanCard: GPU badge, PWM-Fehler chip on write errors, sensor name shown next to temperature, "Kein Sensor zugewiesen" call-to-action when null
  • i18n keys (de + en) for all new strings

Spec & plan

  • docs/superpowers/specs/2026-05-24-fan-overhaul-design.md
  • docs/superpowers/plans/2026-05-24-fan-overhaul.md

Migration

One Alembic migration: new temp_sensor_labels + composite_temp_sensors tables; 12 nullable/defaulted columns added to fan_configs. Existing curve_type defaults to "graph" — no breaking change.

Test plan

  • pytest -k fan — 154 passed
  • pytest tests/{api,services,database,middleware} — 1150 passed (4 unrelated failures in test_plugins_marketplace_routes.py, pre-existing)
  • npx tsc --noEmit — clean
  • npm run build — built in 6.92s
  • Manual dev-mode smoke: open /fans, rename sensor, create mix, switch curve types, toggle advanced
  • On BaluNode (real hardware): GPU fan badge visible; PWM diagnostic surfaces without manual mode; manual mode unlocks GPU PWM writes; disabling restores previous performance level

🤖 Generated with Claude Code

Xveyn and others added 17 commits May 24, 2026 14:59
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…columns)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Service

- Add imports: TempSourceRegistry, HwmonTempSource, GpuTempSource, DiskTempSource,
  MixTempSource, evaluate_curve, time as _time
- Add _registry, _last_pwm_by_fan, _last_tick_ts to __init__
- Add _rebuild_registry() + helpers: _make_hwmon_reader, _make_gpu_reader,
  _make_disk_reader, _list_smart_devices, _register_composites_from_db,
  _load_sensor_labels
- Call _rebuild_registry() in start() after _initialize_backend(), before
  _load_fan_configs()
- Remove silent CPU-sensor auto-correction elif branch from _load_fan_configs
  (Step 3b): user-chosen sensors (including composite mix: IDs) now survive
  service restarts unchanged
- Replace _monitor_and_control_fans body: temperature lookup now routes through
  self._registry.get_temp(), curve evaluation uses evaluate_curve(), hysteresis
  applied only for graph curve_type
- Update TestDbAutoCorrection tests to verify the new no-correction behavior
- Add test_fan_sensor_assignment.py: 5 regression tests locking in that user-
  chosen sensors (non-CPU hwmon, mix:, gpu:, disk:) survive _load_fan_configs

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Extend TempSensorInfo schema with custom_label, kind, gpu_vendor fields
- Add TempSensorLabelUpdate, CompositeSensor{Create,Update,Info,ListResponse} schemas
- Replace list_temp_sensors body: reads from registry (all source kinds) instead
  of backend-only hwmon sensors
- Add PUT/DELETE /api/fans/sensors/{sensor_id}/label endpoints
- Add GET/POST/PUT/DELETE /api/fans/composite-sensors endpoints with 5-sensor cap
- Module-level MAX_COMPOSITES_PER_SYSTEM = 5 constant in routes/fans.py
- Delete composite unlinks FanConfig rows pointing at the composite
- Add sqlalchemy.select and func imports to routes/fans.py
- 18 new tests in test_fan_sensor_label_api.py and test_fan_composite_api.py
  using service-layer integration pattern (db_session fixture)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add CurveTypeSelector (graph/flat/target/mix/sync tab strip)
- Add CurveEditorFlat (single PWM% slider)
- Add CurveEditorTarget (target temp + PWM sliders)
- Add CurveEditorMix (two profile selects + max/sum function toggle)
- Add CurveEditorSync (fan-to-fan sync selector)
- Integrate all editors into FanDetails with conditional rendering
- Add allFans prop to FanDetails; pass status.fans from FanControl
- Extend FanInfo and UpdateFanConfigRequest types with curve-type fields
- Export all 5 new components from index.ts

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ensor label

- Add AdvancedFanSettings component (start_pwm, stop_below_temp, response_time, pwm_steps)
- Add GpuManualModeToggle component (AMD-only, local state, POST /gpu-manual-mode)
- FanCard: GPU badge, last_write_error chip, sensor label next to temperature, no-sensor amber notice
- FanDetails: wire AdvancedFanSettings + GpuManualModeToggle (AMD gate), handleAdvancedChange
- SensorsPanel: convert from self-fetching to prop-driven (sensors, composites, onReload)
- FanControl: lift sensors/composites fetch into page, pass sensors down to FanCard + SensorsPanel
- api/fan-control: add setGpuManualMode(), extend UpdateFanConfigRequest with Task 16 fields
- index.ts: export AdvancedFanSettings and GpuManualModeToggle

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Xveyn Xveyn merged commit 3a2834c into main May 24, 2026
4 checks passed
Xveyn added a commit that referenced this pull request May 24, 2026
## Summary

Fixes four issues with the fan SensorsPanel that surfaced in production
after the fan overhaul (#97):

- **GPU sources showed `—`.** `_make_gpu_reader` reads `data[\"gpu\"]`
from `telemetry.json`, but the snapshot writer never published that key.
The GPU collector was persisting samples to the DB only.
- **Disk sources were missing entirely.** The per-worker SMART cache
(`_SMART_CACHE_DATA`) is empty in every web worker unless that worker
has personally hit `/api/system/smart/status`. With 4 Uvicorn workers,
this was non-deterministic and usually empty. `_rebuild_registry` only
ran at startup — when the cache is always empty.
- **amdgpu hwmon entries duplicated `gpu:*` sources.** The same physical
sensor appeared twice in the panel with different IDs.
- **0°C dead sensors cluttered the panel.** Disconnected motherboard
inputs (PCH_CHIP_TEMP, PCH_CPU_TEMP, …) always read 0.0°C on this board.

## Approach

| Commit | What |
|---|---|
| `dbd54c8b` | `_write_telemetry_snapshot` includes `data[\"gpu\"]` from
`orchestrator.get_gpu_current()`. fan_control reader unchanged. |
| `4d222fb1` | Monitoring worker publishes `smart_summary.json` every
60s (gated by the existing 120s smartctl cache). fan_control reads disk
temps from there. New `_refresh_disk_sources()` reconciles disk:*
registry entries on every `/api/fans/sensors` call. Adds
`TempSourceRegistry.unregister()`. |
| `212d949e` | `_rebuild_registry` skips hwmon sources with `device_name
in {\"amdgpu\", \"nouveau\"}`. Documented trade-off for hypothetical
iGPU+dGPU systems. |
| `276a10d4` | `SensorsPanel`: 0°C sensors collapse into an \"Inaktiv
(N)\" toggle (hidden by default, 60% opacity when shown). Renamed
sensors show their original kernel label in the subtitle (\"edge ·
gpu:edge\") so user-chosen names like \"Composite\" don't blend with
actual mix sensors. |

## Tests

- 4 new test files, 18 new tests:
  - `test_worker_service_telemetry_gpu.py` — GPU SHM publish behavior
- `test_worker_service_smart_summary.py` — SMART summary SHM publish
behavior
  - `test_fan_disk_sources_from_shm.py` — disk reader/list/refresh
  - `test_fan_gpu_hwmon_dedup.py` — amdgpu/nouveau hwmon suppression
- 300 existing fan + monitoring tests still pass

## Test plan

- [ ] After deploy, open `/fans` on prod
- [ ] Verify `gpu:edge` / `gpu:junction` / `gpu:mem` show real
temperatures (not `—`)
- [ ] Verify `hwmon:hwmon1_temp*` (amdgpu duplicates) are gone
- [ ] Wait up to ~2 minutes, verify `disk:sda` / `disk:nvme0n1` / etc.
appear with temperatures
- [ ] Verify PCH 0°C sensors are no longer in the main grid, but still
reachable via \"Inaktiv (4)\" toggle
- [ ] Verify renamed sensors show their original kernel label below the
chosen name

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant