You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expose Sled Agent API for "control plane disk management", use it (#5172)
# Overview
## Virtual Environment Changes
- Acting on Disks, not Zpools
- Previously, sled agent could operate on "user-supplied zpools", which
were created by `./tools/virtual_hardware.sh`
- Now, in a world where Nexus has more control over zpool allocation,
the configuration can supply "virtual devices" instead of "zpools", to
give RSS/Nexus control over "when zpools actually get placed on these
devices".
- Impact:
- `sled-agent/src/config.rs`
- `smf/sled-agent/non-gimlet/config.toml`
- `tools/virtual_hardware.sh`
## Sled Agent Changes
- HTTP API
- The Sled Agent exposes an API to "set" and "get" the "control plane
physical disks" specified by Nexus. The set of control plane physical
disks (usable U.2s) are stored into a ledger on the M.2s (as
`omicron-physical-disks.json`). The set of control plane physical disks
also determines "which disks are available to the rest of the sled
agent".
- StorageManager
- **Before**: When physical U.2 disks are detected by the Sled Agent,
they are "auto-formatted if empty", and we notify Nexus about them. This
"upserts" them into the DB, so they are basically automatically adopted
into the control plane.
- **After**: As we've discussed on RFD 457, we want to get to a world
where physical U.2 disks are **detected** by Sled Agent, but not
**used** until RSS/Nexus explicitly tells the Sled Agent to "use this
sled as part of the control plane". This set of "in-use control plane
disks" is stored on a "ledger" file in the M.2.
- **Transition**: On deployed systems, we need to boot up to Nexus, even
though we don't have a ledger of control plane disks. Within the
implementation of `StorageManager::key_manager_ready`, we implement a
workaround: if we detect a system with no ledger, but with zpools, we'll
use that set of zpools unconditionally until told otherwise. This is a
short-term workaround to migrate existing systems, but can be removed
when deployed racks reliably have ledgers for control plane disks.
- StorageManagerTestHarness
- In an effort to reduce "test fakes" and replace them with real
storage, `StorageManagerTestHarness` provides testing utilities for
spinning up vdevs, formatting them with zpools, and managing them. This
helps us avoid a fair bit of bifurcation for "test-only synthetic disks"
vs "real disks", though it does mean many of our tests in the sled-agent
are now 'illumos-only'.
## RSS Changes
- RSS is now responsible for provisioning "control plane disks and
zpools" during initial bootstrapping
- RSS informs Nexus about the allocation decisions it makes via the RSS
handoff
## Nexus Changes
- Nexus exposes a smaller API (no notification of "disk add/remove,
zpools add/remove"). It receives a handoff from RSS, and will later be
in charge of provisioning decisions based on inventory.
- Dynamically adding/removing disks/zpools after RSS will be appearing
in a subsequent PR.
---------
Co-authored-by: Andrew J. Stone <andrew.j.stone.1@gmail.com>
0 commit comments