Add a diagnostic kstat for obtaining pool status #16026
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
A hung pool process can be left holding the spa config lock or the spa namespace lock. If an admin wants to observe the status of a pool using the traditional zpool status, it could hang waiting for one of the locks held by the stuck process. It would be nice to observe pool status in this scenario without the risk of the inquiry hanging.
Description
Exploring Solutions
Infer that the lock is stuck (held for an extended period) and conclude that locking is not required to read the pool stats. This is somewhat a variant of 1, where the source code, instead of the admin user, is determining that it is safe to ignore locking since the pool configuration cannot be changing.
Refactor the spa code to have more fine grain locking and perhaps use reader/writer locks in lieu of mutex locks to alleviate the obvious points of lock contention when a pool gets stuck. Don't hold these global scope locks across disk I/O, etc.
This change is implementing option 1a -- adding a kstat at
zfs/<pool>/stats.json
which ignores any locking. This kstat can be used for investigations when pools are in a hung state while holding global locks required for a traditional'zpool status'
to proceed.NOTE: This kstat is not safe to use in conditions where pools are in the process of configuration changes (i.e., adding/removing devices). Therefore, this kstat is not intended to be a general replacement or alternative to using
'zpool status'
.Sponsored-by: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.
How Has This Been Tested?
zpool_status_kstat_pos
test to validate the JSON outputsample kstat output (degraded mirror):
Types of changes
Checklist:
Signed-off-by
.