[external-api] Add health field to update status#10271
[external-api] Add health field to update status#10271karencfv wants to merge 5 commits intooxidecomputer:mainfrom
Conversation
|
It worries me slightly to tell the user the system is unhealthy at times when that's expected. |
I totally get it. My first instinct was to call this "is_system_updateable" or something like that. We discussed somewhere, but I think it was during a meeting or something. I was looking for the discussion but couldn't find it. I don't remember the specifics, but I think the reasoning behind this naming was to make sure users don't ignore this issue if they encounter an "unhealthy" system and they do call support. Maybe @davepacheco can expand An idea was floated around that the console could hide the status while there was an ongoing update, @david-crespo what is your take on this? |
|
That’s interesting, so it would be like health/unhealthy, unless less than 100% of components are on the target version, in which case we’re “updating” or something. I guess I wonder what “unhealthy” is supposed to tell the user. I’d much rather have it in the form of an active problem. |
|
The idea of this work is to take the place of the health check script the support team currently runs before and after each update until we have a proper FM implementation. We want it specifically tied to the update process https://rfd.shared.oxide.computer/rfd/0612. More detail here #9876. Perhaps we can chat further on the topic at the next update sync to make sure we're all on the same page? |
|
That's helpful, I'll read that issue. Off the top of my head I think it would feel better to me (and possibly be more useful to support) to have all the sub-checks as separate booleans rather than synthesizing them all into one big AND. And it doesn't really feel like that update-specific, even though it's used during update. So maybe it belongs in its own endpoint? |
This PR is the last piece for a minimal system health check for update status. It is a new field in the
system/update/statusAPI calledis_system_healthywhich is either true or false based on the information in the latest inventory collection. Once #10027 is merged, we'll include stale sagas as well.Disclaimer: I used the claude code skill to make the endpoint edit, and also for part of the code (trying to learn how to use it here). I checked the code several times and tested manually, but just thought I'd mention it here.
Manual tests:
There are unhealthy services
Everything is happy!
Closes: #9418