Skip to content

Update thing-flinger for modern Omicron #1180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 25, 2022
Merged

Update thing-flinger for modern Omicron #1180

merged 13 commits into from
Aug 25, 2022

Conversation

andrewjstone
Copy link
Contributor

@andrewjstone andrewjstone commented Jun 9, 2022

Omicron has undergone many enhancements in the last few months including, but no limited to, some things that affect Falcon:

  • RSS existing and generating a rack distribution plan and secret
  • Automated network setup and routing via Maghemite
  • External TLS certificate support

Some support for these features in falcon was already included in prior commits. This commit includes support during the install-prereqs command via the following functions:

    install_rustup_on_deployment_servers(config);
    create_virtual_hardware_on_deployment_servers(config);
    create_external_tls_cert_on_builder(config)?;

It also allows a location on the client to be provided for the config-rss.toml so that different configs can be used when deploying to different topologies. This is especially useful with falcon.

Besides installing all the necessary pre-requisites to support the new Omicron features, thing-flinger itself has been improved by parallelizing all installs and deploys, and delaying the startup of sled-agent until after the overlay files are in place. Prior restarts triggered #1592 which are now no longer an issue. In order to allow this, two new deploy commands were added to omicron-package: DeployCommand::Unpack and DeployCommand::Activate which separate the unpacking of package files from the installation of SMF manifests and starting of services.

While running thing-flinger deployments, I noticed that trust-quorum was no longer working and secrets were not being shared. With some help from @jgallagher that has since been rectified.

With all these changes, thing-flinger is now capable of deploying into arbitrary falcon topologies, with the first one being exemplified in https://github.com/oxidecomputer/falcon-omicron which provides a two node setup. I have run this locally on my helios box and am working to get it running on buildomat.

@andrewjstone andrewjstone marked this pull request as ready for review June 10, 2022 22:44
`create_virtual_hardware.sh` requires access to config.toml to create
zpools. We now copy over the smf/sled-agent directory to deployment
servers so that the script can find what zpools to create.

Fixes #1612
@andrewjstone andrewjstone changed the title Fling falcons Update thing-flinger for modern Omicron Aug 19, 2022
Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to test this locally today, successfully after one minor hiccup (left a comment about config-rss.toml naming). Changes all LGTM!

@andrewjstone
Copy link
Contributor Author

I made all changes related to @jgallagher's review. I also fixed a bug in deploy uninstall where it was trying to uninstall on the builder, which it should not do.

@andrewjstone andrewjstone enabled auto-merge (squash) August 25, 2022 04:31
@andrewjstone andrewjstone merged commit b83e86a into main Aug 25, 2022
@andrewjstone andrewjstone deleted the fling-falcons branch August 25, 2022 05:08
leftwo pushed a commit that referenced this pull request Mar 4, 2024
Crucible changes:
Per client, queue-based backpressure (#1186)
A builder for the Downstairs Downstairs struct. (#1152)
Update Rust to v1.76.0 (#1153)
Deactivate the read only parent after a scrub (#1180)
Start byte-based backpressure earlier (#1179)
Tweak CI scripts to fix warnings (#1178)
Make `gw_ds_complete` less public (#1175)
Verify extent under repair is valid after copying files (#1159)
Remove individual panic setup, use global panic settings (#1174)
[smf] Use new zone network config service (#1096)
Move a few methods into downstairs (#1160)
Remove extra clone in upstairs read (#1163)
Make `crucible-downstairs` not depend on upstairs (#1165)
Update Rust crate rusqlite to 0.31 (#1171)
Update Rust crate reedline to 0.29.0 (#1170)
Update Rust crate clap to 4.5 (#1169)
Update Rust crate indicatif to 0.17.8 (#1168)
Update progenitor to bc0bb4b (#1164)
Do not 500 on snapshot delete for deleted region (#1162)
Drop jobs from Offline downstairs. (#1157)
`Mutex<Work>` → `Work` (#1156)
Added a contributing.md (#1158)
Remove ExtentFlushClose::source_downstairs (#1154)
Remove unnecessary mutexes from Downstairs (#1132)

Propolis changes:
PHD: improve Windows reliability (#651)
Update progenitor and omicron deps
Clean up VMM resource on server shutdown
Remove Inventory mechanism
Update vergen dependency
Properly handle pre/post illumos#16183 fixups
PHD: add `pfexec` to xtask phd-runner invocation (#647)
PHD: add Windows Server 2016 adapter & improve WS2016/2019 reliability (#646)
PHD: use `clap` for more `cargo xtask phd` args (#645)
PHD: several `cargo xtask phd` CLI fixes (#642)
PHD: Use ZFS clones for file-backed disks (#640)
PHD: improve ctrl-c handling (#634)
leftwo added a commit that referenced this pull request Mar 4, 2024
Crucible changes:
Per client, queue-based backpressure (#1186)
A builder for the Downstairs Downstairs struct. (#1152) Update Rust to
v1.76.0 (#1153)
Deactivate the read only parent after a scrub (#1180) Start byte-based
backpressure earlier (#1179)
Tweak CI scripts to fix warnings (#1178)
Make `gw_ds_complete` less public (#1175)
Verify extent under repair is valid after copying files (#1159) Remove
individual panic setup, use global panic settings (#1174) [smf] Use new
zone network config service (#1096)
Move a few methods into downstairs (#1160)
Remove extra clone in upstairs read (#1163)
Make `crucible-downstairs` not depend on upstairs (#1165) Update Rust
crate rusqlite to 0.31 (#1171)
Update Rust crate reedline to 0.29.0 (#1170)
Update Rust crate clap to 4.5 (#1169)
Update Rust crate indicatif to 0.17.8 (#1168)
Update progenitor to bc0bb4b (#1164)
Do not 500 on snapshot delete for deleted region (#1162) Drop jobs from
Offline downstairs. (#1157)
`Mutex<Work>` → `Work` (#1156)
Added a contributing.md (#1158)
Remove ExtentFlushClose::source_downstairs (#1154) Remove unnecessary
mutexes from Downstairs (#1132)

Propolis changes:
PHD: improve Windows reliability (#651)
Update progenitor and omicron deps
Clean up VMM resource on server shutdown
Remove Inventory mechanism
Update vergen dependency
Properly handle pre/post illumos#16183 fixups
PHD: add `pfexec` to xtask phd-runner invocation (#647) PHD: add Windows
Server 2016 adapter & improve WS2016/2019 reliability (#646) PHD: use
`clap` for more `cargo xtask phd` args (#645) PHD: several `cargo xtask
phd` CLI fixes (#642)
PHD: Use ZFS clones for file-backed disks (#640)
PHD: improve ctrl-c handling (#634)

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants