Skip to content

Commit 4eb8153

Browse files
Expose Sled Agent API for "control plane disk management", use it (#5172)
# Overview ## Virtual Environment Changes - Acting on Disks, not Zpools - Previously, sled agent could operate on "user-supplied zpools", which were created by `./tools/virtual_hardware.sh` - Now, in a world where Nexus has more control over zpool allocation, the configuration can supply "virtual devices" instead of "zpools", to give RSS/Nexus control over "when zpools actually get placed on these devices". - Impact: - `sled-agent/src/config.rs` - `smf/sled-agent/non-gimlet/config.toml` - `tools/virtual_hardware.sh` ## Sled Agent Changes - HTTP API - The Sled Agent exposes an API to "set" and "get" the "control plane physical disks" specified by Nexus. The set of control plane physical disks (usable U.2s) are stored into a ledger on the M.2s (as `omicron-physical-disks.json`). The set of control plane physical disks also determines "which disks are available to the rest of the sled agent". - StorageManager - **Before**: When physical U.2 disks are detected by the Sled Agent, they are "auto-formatted if empty", and we notify Nexus about them. This "upserts" them into the DB, so they are basically automatically adopted into the control plane. - **After**: As we've discussed on RFD 457, we want to get to a world where physical U.2 disks are **detected** by Sled Agent, but not **used** until RSS/Nexus explicitly tells the Sled Agent to "use this sled as part of the control plane". This set of "in-use control plane disks" is stored on a "ledger" file in the M.2. - **Transition**: On deployed systems, we need to boot up to Nexus, even though we don't have a ledger of control plane disks. Within the implementation of `StorageManager::key_manager_ready`, we implement a workaround: if we detect a system with no ledger, but with zpools, we'll use that set of zpools unconditionally until told otherwise. This is a short-term workaround to migrate existing systems, but can be removed when deployed racks reliably have ledgers for control plane disks. - StorageManagerTestHarness - In an effort to reduce "test fakes" and replace them with real storage, `StorageManagerTestHarness` provides testing utilities for spinning up vdevs, formatting them with zpools, and managing them. This helps us avoid a fair bit of bifurcation for "test-only synthetic disks" vs "real disks", though it does mean many of our tests in the sled-agent are now 'illumos-only'. ## RSS Changes - RSS is now responsible for provisioning "control plane disks and zpools" during initial bootstrapping - RSS informs Nexus about the allocation decisions it makes via the RSS handoff ## Nexus Changes - Nexus exposes a smaller API (no notification of "disk add/remove, zpools add/remove"). It receives a handoff from RSS, and will later be in charge of provisioning decisions based on inventory. - Dynamically adding/removing disks/zpools after RSS will be appearing in a subsequent PR. --------- Co-authored-by: Andrew J. Stone <andrew.j.stone.1@gmail.com>
1 parent e5094dc commit 4eb8153

File tree

86 files changed

+5037
-2632
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+5037
-2632
lines changed

.github/buildomat/jobs/deploy.sh

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ PXA_END="$EXTRA_IP_END"
205205
export GATEWAY_IP GATEWAY_MAC PXA_START PXA_END
206206

207207
pfexec zpool create -f scratch c1t1d0 c2t1d0
208-
ZPOOL_VDEV_DIR=/scratch ptime -m pfexec ./tools/create_virtual_hardware.sh
208+
VDEV_DIR=/scratch ptime -m pfexec ./tools/create_virtual_hardware.sh
209209

210210
#
211211
# Generate a self-signed certificate to use as the initial TLS certificate for
@@ -214,7 +214,12 @@ ZPOOL_VDEV_DIR=/scratch ptime -m pfexec ./tools/create_virtual_hardware.sh
214214
# real system, the certificate would come from the customer during initial rack
215215
# setup on the technician port.
216216
#
217-
tar xf out/omicron-sled-agent.tar pkg/config-rss.toml
217+
tar xf out/omicron-sled-agent.tar pkg/config-rss.toml pkg/config.toml
218+
219+
# Update the vdevs to point to where we've created them
220+
sed -E -i~ "s/(m2|u2)(.*\.vdev)/\/scratch\/\1\2/g" pkg/config.toml
221+
diff -u pkg/config.toml{~,} || true
222+
218223
SILO_NAME="$(sed -n 's/silo_name = "\(.*\)"/\1/p' pkg/config-rss.toml)"
219224
EXTERNAL_DNS_DOMAIN="$(sed -n 's/external_dns_zone_name = "\(.*\)"/\1/p' pkg/config-rss.toml)"
220225

@@ -241,8 +246,8 @@ addresses = \\[\"$UPLINK_IP/24\"\\]
241246
" pkg/config-rss.toml
242247
diff -u pkg/config-rss.toml{~,} || true
243248

244-
tar rvf out/omicron-sled-agent.tar pkg/config-rss.toml
245-
rm -f pkg/config-rss.toml*
249+
tar rvf out/omicron-sled-agent.tar pkg/config-rss.toml pkg/config.toml
250+
rm -f pkg/config-rss.toml* pkg/config.toml*
246251

247252
#
248253
# By default, OpenSSL creates self-signed certificates with "CA:true". The TLS

Cargo.lock

Lines changed: 5 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -444,6 +444,7 @@ update-engine = { path = "update-engine" }
444444
usdt = "0.5.0"
445445
uuid = { version = "1.7.0", features = ["serde", "v4"] }
446446
walkdir = "2.4"
447+
whoami = "1.5"
447448
wicket = { path = "wicket" }
448449
wicket-common = { path = "wicket-common" }
449450
wicketd-client = { path = "clients/wicketd-client" }

clients/sled-agent-client/src/lib.rs

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ progenitor::generate_api!(
3535
// replace directives below?
3636
replace = {
3737
ByteCount = omicron_common::api::external::ByteCount,
38+
DiskIdentity = omicron_common::disk::DiskIdentity,
3839
Generation = omicron_common::api::external::Generation,
3940
MacAddr = omicron_common::api::external::MacAddr,
4041
Name = omicron_common::api::external::Name,
@@ -230,16 +231,6 @@ impl omicron_common::api::external::ClientError for types::Error {
230231
}
231232
}
232233

233-
impl From<types::DiskIdentity> for omicron_common::disk::DiskIdentity {
234-
fn from(identity: types::DiskIdentity) -> Self {
235-
Self {
236-
vendor: identity.vendor,
237-
serial: identity.serial,
238-
model: identity.model,
239-
}
240-
}
241-
}
242-
243234
impl From<omicron_common::api::internal::nexus::InstanceRuntimeState>
244235
for types::InstanceRuntimeState
245236
{

common/src/api/external/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -881,6 +881,7 @@ pub enum ResourceType {
881881
ServiceNetworkInterface,
882882
Sled,
883883
SledInstance,
884+
SledLedger,
884885
Switch,
885886
SagaDbg,
886887
Snapshot,

common/src/ledger.rs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
use async_trait::async_trait;
88
use camino::{Utf8Path, Utf8PathBuf};
99
use serde::{de::DeserializeOwned, Serialize};
10-
use slog::{debug, info, warn, Logger};
10+
use slog::{debug, error, info, warn, Logger};
1111

1212
#[derive(thiserror::Error, Debug)]
1313
pub enum Error {
@@ -127,14 +127,15 @@ impl<T: Ledgerable> Ledger<T> {
127127
let mut one_successful_write = false;
128128
for path in self.paths.iter() {
129129
if let Err(e) = self.atomic_write(&path).await {
130-
warn!(self.log, "Failed to write to {}: {e}", path);
130+
warn!(self.log, "Failed to write ledger"; "path" => ?path, "err" => ?e);
131131
failed_paths.push((path.to_path_buf(), e));
132132
} else {
133133
one_successful_write = true;
134134
}
135135
}
136136

137137
if !one_successful_write {
138+
error!(self.log, "No successful writes to ledger");
138139
return Err(Error::FailedToWrite { failed_paths });
139140
}
140141
Ok(())

illumos-utils/Cargo.toml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ smf.workspace = true
2828
thiserror.workspace = true
2929
tokio.workspace = true
3030
uuid.workspace = true
31+
whoami.workspace = true
3132
zone.workspace = true
3233

3334
# only enabled via the `testing` feature
@@ -46,6 +47,3 @@ toml.workspace = true
4647
[features]
4748
# Enable to generate MockZones
4849
testing = ["mockall"]
49-
# Useful for tests that want real functionality and ability to run without
50-
# pfexec
51-
tmp_keypath = []

illumos-utils/src/zfs.rs

Lines changed: 40 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
//! Utilities for poking at ZFS.
66
77
use crate::{execute, PFEXEC};
8-
use camino::Utf8PathBuf;
8+
use camino::{Utf8Path, Utf8PathBuf};
99
use omicron_common::disk::DiskIdentity;
1010
use std::fmt;
1111

@@ -28,8 +28,6 @@ pub const ZFS: &str = "/usr/sbin/zfs";
2828
/// the keys and recreate the files on demand when creating and mounting
2929
/// encrypted filesystems. We then zero them and unlink them.
3030
pub const KEYPATH_ROOT: &str = "/var/run/oxide/";
31-
// Use /tmp so we don't have to worry about running tests with pfexec
32-
pub const TEST_KEYPATH_ROOT: &str = "/tmp";
3331

3432
/// Error returned by [`Zfs::list_datasets`].
3533
#[derive(thiserror::Error, Debug)]
@@ -168,27 +166,34 @@ impl fmt::Display for Keypath {
168166
}
169167
}
170168

171-
#[cfg(not(feature = "tmp_keypath"))]
172-
impl From<&DiskIdentity> for Keypath {
173-
fn from(id: &DiskIdentity) -> Self {
174-
build_keypath(id, KEYPATH_ROOT)
175-
}
176-
}
177-
178-
#[cfg(feature = "tmp_keypath")]
179-
impl From<&DiskIdentity> for Keypath {
180-
fn from(id: &DiskIdentity) -> Self {
181-
build_keypath(id, TEST_KEYPATH_ROOT)
169+
impl Keypath {
170+
/// Constructs a Keypath for the specified disk within the supplied root
171+
/// directory.
172+
///
173+
/// By supplying "root", tests can override the location where these paths
174+
/// are stored to non-global locations.
175+
pub fn new<P: AsRef<Utf8Path>>(id: &DiskIdentity, root: &P) -> Keypath {
176+
let keypath_root = Utf8PathBuf::from(KEYPATH_ROOT);
177+
let mut keypath = keypath_root.as_path();
178+
let keypath_directory = loop {
179+
match keypath.strip_prefix("/") {
180+
Ok(stripped) => keypath = stripped,
181+
Err(_) => break root.as_ref().join(keypath),
182+
}
183+
};
184+
std::fs::create_dir_all(&keypath_directory)
185+
.expect("Cannot ensure directory for keys");
186+
187+
let filename = format!(
188+
"{}-{}-{}-zfs-aes-256-gcm.key",
189+
id.vendor, id.serial, id.model
190+
);
191+
let path: Utf8PathBuf =
192+
[keypath_directory.as_str(), &filename].iter().collect();
193+
Keypath(path)
182194
}
183195
}
184196

185-
fn build_keypath(id: &DiskIdentity, root: &str) -> Keypath {
186-
let filename =
187-
format!("{}-{}-{}-zfs-aes-256-gcm.key", id.vendor, id.serial, id.model);
188-
let path: Utf8PathBuf = [root, &filename].iter().collect();
189-
Keypath(path)
190-
}
191-
192197
#[derive(Debug)]
193198
pub struct EncryptionDetails {
194199
pub keypath: Keypath,
@@ -332,6 +337,20 @@ impl Zfs {
332337
err: err.into(),
333338
})?;
334339

340+
// We ensure that the currently running process has the ability to
341+
// act on the underlying mountpoint.
342+
if !zoned {
343+
let mut command = std::process::Command::new(PFEXEC);
344+
let user = whoami::username();
345+
let mount = format!("{mountpoint}");
346+
let cmd = command.args(["chown", "-R", &user, &mount]);
347+
execute(cmd).map_err(|err| EnsureFilesystemError {
348+
name: name.to_string(),
349+
mountpoint: mountpoint.clone(),
350+
err: err.into(),
351+
})?;
352+
}
353+
335354
if let Some(SizeDetails { quota, compression }) = size_details {
336355
// Apply any quota and compression mode.
337356
Self::apply_properties(name, &mountpoint, quota, compression)?;

illumos-utils/src/zpool.rs

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,12 @@ use std::fmt;
1212
use std::str::FromStr;
1313
use uuid::Uuid;
1414

15-
const ZPOOL_EXTERNAL_PREFIX: &str = "oxp_";
16-
const ZPOOL_INTERNAL_PREFIX: &str = "oxi_";
15+
pub const ZPOOL_EXTERNAL_PREFIX: &str = "oxp_";
16+
pub const ZPOOL_INTERNAL_PREFIX: &str = "oxi_";
1717
const ZPOOL: &str = "/usr/sbin/zpool";
1818

19+
pub const ZPOOL_MOUNTPOINT_ROOT: &str = "/";
20+
1921
#[derive(thiserror::Error, Debug, PartialEq, Eq)]
2022
#[error("Failed to parse output: {0}")]
2123
pub struct ParseError(String);
@@ -192,7 +194,7 @@ impl Zpool {
192194
let mut cmd = std::process::Command::new(PFEXEC);
193195
cmd.env_clear();
194196
cmd.env("LC_ALL", "C.UTF-8");
195-
cmd.arg(ZPOOL).arg("create");
197+
cmd.arg(ZPOOL).args(["create", "-o", "ashift=12"]);
196198
cmd.arg(&name.to_string());
197199
cmd.arg(vdev);
198200
execute(&mut cmd).map_err(Error::from)?;
@@ -374,9 +376,14 @@ impl ZpoolName {
374376
/// Returns a path to a dataset's mountpoint within the zpool.
375377
///
376378
/// For example: oxp_(UUID) -> /pool/ext/(UUID)/(dataset)
377-
pub fn dataset_mountpoint(&self, dataset: &str) -> Utf8PathBuf {
379+
pub fn dataset_mountpoint(
380+
&self,
381+
root: &Utf8Path,
382+
dataset: &str,
383+
) -> Utf8PathBuf {
378384
let mut path = Utf8PathBuf::new();
379-
path.push("/pool");
385+
path.push(root);
386+
path.push("pool");
380387
match self.kind {
381388
ZpoolKind::External => path.push("ext"),
382389
ZpoolKind::Internal => path.push("int"),

installinator/src/hardware.rs

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ use anyhow::Result;
99
use sled_hardware::DiskVariant;
1010
use sled_hardware::HardwareManager;
1111
use sled_hardware::SledMode;
12+
use sled_storage::config::MountConfig;
1213
use sled_storage::disk::Disk;
1314
use sled_storage::disk::RawDisk;
1415
use slog::info;
@@ -49,9 +50,15 @@ impl Hardware {
4950
);
5051
}
5152
DiskVariant::M2 => {
52-
let disk = Disk::new(log, disk, None)
53-
.await
54-
.context("failed to instantiate Disk handle for M.2")?;
53+
let disk = Disk::new(
54+
log,
55+
&MountConfig::default(),
56+
disk,
57+
None,
58+
None,
59+
)
60+
.await
61+
.context("failed to instantiate Disk handle for M.2")?;
5562
m2_disks.push(disk);
5663
}
5764
}

0 commit comments

Comments
 (0)