-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Labels
DebuggingFor when you want better data in debugging an issue (log messages, post mortem debugging, and more)For when you want better data in debugging an issue (log messages, post mortem debugging, and more)
Description
This function is used to collect information about zpools on the system, which the sled agent then manages as storage devices for things like the databases or Crucible. This collects some of the fields about each pool, such as the total size, or the number of allocated bytes. However, for faulted devices, these fields are not present. Specifically, we'd see:
bnaecker@feldspar : ~/omicron $ zpool list -Hpo name,size,allocated,free,health oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b - - - FAULTED
bnaecker@feldspar : ~/omicron $
The code in that function attempts to parse the string -
as a number, which obviously fails. This can prevent the sled agent from making any further progress, which we can see in the log as:
{"msg":"failed to start sled agent","v":0,"name":"SledAgent","level":40,"time":"2022-08-11T18:17:33.839069674Z","hostname":"feldspar","pid":2455,"component":"BootstrapAgentRssHandler","error":"ServerFailure(\"Sled agent request failed: Error starting sled agent: Could not start sled agent server: Error managing storage: Failed to get info for zpool 'oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b': Failed to parse output: Failed to parse field 'size': invalid digit found in string\")"}
We probably want to handle this more gracefully, not trying to parse out data if the pool is faulted.
Metadata
Metadata
Assignees
Labels
DebuggingFor when you want better data in debugging an issue (log messages, post mortem debugging, and more)For when you want better data in debugging an issue (log messages, post mortem debugging, and more)