Skip to content

Start implementing bootstore: Replicated storage required for rack unlock #1656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Sep 7, 2022

Conversation

andrewjstone
Copy link
Contributor

@andrewjstone andrewjstone commented Aug 31, 2022

Currently RSS is in charge of generating the rack secret and the bootstrap agent is in charge of establishing sprockets sessions and transferring shares required for rack unlock. The shares themselves, and the trust quorum membership is stored in a file on each sled. The major problems with this implementation are that it is not resilient to failure and does not allow reconfiguration of the trust quorum. Additionally, the trust quorum code is somewhat scattered across the sled-agent, and a little hard to follow.

This change introduces the bootstore, which will become responsible for all trust quorum protocol and storage on the rack. The bootstore will be instantiated by the boot agent, which will be responsible for establishing sprockets sessions that the trust quorum can operate over. This allows for a readable, testable implementation of the protocol and storage without having to worry about extraneous details related to networking or the SP. We may choose to pull that in later to a higher level module in the bootstore. I intend to utilize property based tests with fault injection to cover the protocol.

This is an early WIP of the bootstore, and lays out the basic structure of the code, while implementing some small bits. The goal is to get this out early before it gets too large to easily review.

The Node is the main entry point to the codebase, although most logic has not been implemented. The Node uses the Db to query and store trust quorum information. The goal is to build up a fairly general two-phase commit mechanism that can also be used for storing networking information required to setup NTP so that CockroachDb can boot and the rest of the control plane software can run. The two-phase commit is standard, but as always has the problem of any node failing halts the protocol. We will manage this by moving to a new epoch and reconfiguration if this occurs. Of course the devil is in the details, but it's not incredibly complicated. This implementation will supercede what is written in section 5 of RFD 238. I will update that document with the new protocol. One thing that was not considered there was transferring the encrypted (wrapped) root secret to new nodes that don't have the original root secret derived from the epoch 0 rack secret. As part of this, we'll distribute it as part of the 2-phase commit. There are a couple mechanisms with various levels of security, that I won't get into here. The important thing to know for this PR is that I created a table and model for storing the secret. Code to use it is unimplemented so far.

There is also the notion of a [Coordinator] that is not yet implemented. The coordinator interacts with the sled-agent to record important information in nexus and get instructions related to trust quorum from nexus. It only stores state in memory and can crash without consequence. It may end up being driven by a saga.

I also copied the trust_quorum directory from the sled-agent as the bootstore will become responsible for the trust quorum. I haven't yet removed this code from the sled-agent, as this work is preliminary and I don't want to hook it in yet.

@andrewjstone
Copy link
Contributor Author

@jgallagher One more thing I should mention with regards to this PR. There's no locking or concurrency support of any kind on the DB connection yet. That's because I haven't figured out what I want to do. I want to get more code written first. In the majority of cases, the DB will be Read-Only, and I'm not sure if we want to just cache state at the app layer and use a single connection with an Arc<Mutex<SqliteConnection>>, do this without the app layer caching, use multiple connections, possibly one per Node created on demand, a connection pool, or something else. In short, don't worry about concurrency on this particular PR. It'll get decided upon and implemented shortly in a follow up.

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; I appreciate splitting this out into an initial PR before it becomes larger - thanks!

@andrewjstone andrewjstone enabled auto-merge (squash) September 1, 2022 19:32
@andrewjstone andrewjstone merged commit 55cc15c into main Sep 7, 2022
@andrewjstone andrewjstone deleted the bootstore branch September 7, 2022 00:19
@davepacheco
Copy link
Collaborator

I wanted to add some notes on the pq-sys stuff since we spent some time digging into it. To summarize:

  • we have a new crate
  • it wants a dev dependency on omicron-common
  • it thus depends on pq-sys (via diesel)
  • it must thus use the omicron-rpaths pattern, which involves a build dependency on omicron-rpaths and a regular dependency on pq-sys version "*"

We observed this behavior:

  • if we have a dev dependency on pq-sys, and if the build.rs runs unconditionally, then you get a compile-time error about DEP_PQ_LIBDIRS not being set. This happens because this variable is set by Cargo when that variable is provided by a non-dev dependency that emits the variable from its build script. pq-sys emits this. But since we're using it as a dev dependency in this case, it's not there in our build script.
  • if you have a dev dependency on pq-sys, and if the build.rs is wrapped with #[cfg(test)], then you get a runtime error from ld.so.1 about not finding libpq. You might think this solution would do the right thing by causing non-test binaries to get no RPATH entries (since the build script wouldn't have done anything) and test binaries to get the right RPATH entries (since the build script would use omicron-rpaths in that case). But we concluded that build scripts are run only once for the whole package, not once per crate (binary) built by the package. It presumably gets run without #[cfg(test)]. So you just wind up building a binary that depends on pq-sys (because it's a dev-dependency and you're building a unit test) but that's missing the RPATH entry. Hence this failure mode.
  • If we have a regular dependency on pq-sys, it all works.

Because build scripts appear to only run once and also only get metadata from regular deps (not dev deps), the conclusion is that if your package ever depends on pq-sys directly or indirectly, even via dev-dependency, then you must apply the pattern by adding a regular dependency on pq-sys. You can't use a dev-dependency for pq-sys, even if you only depend on pq-sys via a dev-dependency.

leftwo pushed a commit that referenced this pull request Mar 26, 2025
Crucible changes are:
Make `ImpactedBlocks` an absolute range (#1685)
update omicron (#1687)
Update Rust crate chrono to v0.4.40 (#1682)
Update Rust crate tempfile to v3.19.0 (#1676)
Log loops during crutest replay (#1674)
Refactor live-repair start and send (#1673)
Update Rust crate libc to v0.2.171 (#1684)
Update Rust crate clap to v4.5.32 (#1683)
Update Rust crate async-trait to 0.1.88 (#1680)
Update Rust crate bytes to v1.10.1 (#1681)
Update Rust crate anyhow to v1.0.97 (#1679)
Update Rust crate http to v1 (#1678)
Update Rust crate tokio-util to v0.7.14 (#1675)
Update Rust crate thiserror to v2 (#1657)
Update Rust crate omicron-zone-package to 0.12.0 (#1647)
Update Rust crate opentelemetry to 0.28.0 (#1543)
Update Rust crate itertools to 0.14.0 (#1646)
Update Rust crate clearscreen to v4 (#1656)
nightly test polish (#1665)
Update Rust crate strum_macros to 0.27 (#1654)
Update Rust crate strum to 0.27 (#1653)
Update Rust crate rusqlite to 0.34 (#1652)
Better return semantics from should_send
Trying to simplify enqueue
Remove unnecessary argument
Remove unused code from `ImpactedBlocks` (#1668)
Log when pantry/volume layers activate things (#1670)
Fix unused import (#1669)
Use `RangeSet` instead of tracking every complete job (#1663)
Update Rust crate dropshot to 0.16.0 (#1645)
Simplify pending work tracking (#1662)
Remove rusqlite dependency from crucible-common (#1659)
Update Rust crate reedline to 0.38.0 (#1651)
Update Rust crate proptest to 1.6.0 (#1648)
Update Rust crate bytes to v1.10.0 (#1644)
Update Rust crate tracing-subscriber to 0.3.19 (#1643)
Update Rust crate tracing to v0.1.41 (#1642)
Update Rust crate toml to v0.8.20 (#1641)
Update Rust crate thiserror to v1.0.69 (#1640)
Update Rust crate serde_json to v1.0.139 (#1639)
Update Rust crate serde to v1.0.218 (#1638)
Update Rust crate reqwest to v0.12.12 (#1637)
Update Rust crate libc to v0.2.170 (#1636)
Update Rust crate indicatif to 0.17.11 (#1635)
Update Rust crate httptest to 0.16.3 (#1634)
Update Rust crate clap to v4.5.31 (#1632)
Update Rust crate chrono to v0.4.39 (#1631)
Update Rust crate byte-unit to 5.1.6 (#1630)
Update Rust crate async-trait to 0.1.86 (#1629)
Update Rust crate csv to 1.3.1 (#1633)
Update Rust crate anyhow to v1.0.96 (#1628)
Fix a test flake (#1626)
Bitpack per-block slot data in memory (#1625)
Use `div_ceil` instead of `(x + 7) / 8` (#1624)
Add repair server dynamometer (#1618)

Propolis changes are:
Paper over minor() differences
Update crucible (#886)
Test oximeter metrics end to end (#855)
server: improve behavior of starting VMs that are waiting for Crucible activation (#873)
server: extend API request queue's memory of prior requests (#880)
Use production propolis-server flags in PHD CI builds (#878)

Added a new field to the Board struct that propolis requires.
leftwo added a commit that referenced this pull request Mar 27, 2025
Crucible changes are:
Make `ImpactedBlocks` an absolute range (#1685)
update omicron (#1687)
Update Rust crate chrono to v0.4.40 (#1682)
Update Rust crate tempfile to v3.19.0 (#1676)
Log loops during crutest replay (#1674)
Refactor live-repair start and send (#1673)
Update Rust crate libc to v0.2.171 (#1684)
Update Rust crate clap to v4.5.32 (#1683)
Update Rust crate async-trait to 0.1.88 (#1680)
Update Rust crate bytes to v1.10.1 (#1681)
Update Rust crate anyhow to v1.0.97 (#1679)
Update Rust crate http to v1 (#1678)
Update Rust crate tokio-util to v0.7.14 (#1675)
Update Rust crate thiserror to v2 (#1657)
Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust
crate opentelemetry to 0.28.0 (#1543)
Update Rust crate itertools to 0.14.0 (#1646)
Update Rust crate clearscreen to v4 (#1656)
nightly test polish (#1665)
Update Rust crate strum_macros to 0.27 (#1654)
Update Rust crate strum to 0.27 (#1653)
Update Rust crate rusqlite to 0.34 (#1652)
Better return semantics from should_send
Trying to simplify enqueue
Remove unnecessary argument
Remove unused code from `ImpactedBlocks` (#1668)
Log when pantry/volume layers activate things (#1670) Fix unused import
(#1669)
Use `RangeSet` instead of tracking every complete job (#1663) Update
Rust crate dropshot to 0.16.0 (#1645)
Simplify pending work tracking (#1662)
Remove rusqlite dependency from crucible-common (#1659) Update Rust
crate reedline to 0.38.0 (#1651)
Update Rust crate proptest to 1.6.0 (#1648)
Update Rust crate bytes to v1.10.0 (#1644)
Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate
tracing to v0.1.41 (#1642)
Update Rust crate toml to v0.8.20 (#1641)
Update Rust crate thiserror to v1.0.69 (#1640)
Update Rust crate serde_json to v1.0.139 (#1639)
Update Rust crate serde to v1.0.218 (#1638)
Update Rust crate reqwest to v0.12.12 (#1637)
Update Rust crate libc to v0.2.170 (#1636)
Update Rust crate indicatif to 0.17.11 (#1635)
Update Rust crate httptest to 0.16.3 (#1634)
Update Rust crate clap to v4.5.31 (#1632)
Update Rust crate chrono to v0.4.39 (#1631)
Update Rust crate byte-unit to 5.1.6 (#1630)
Update Rust crate async-trait to 0.1.86 (#1629)
Update Rust crate csv to 1.3.1 (#1633)
Update Rust crate anyhow to v1.0.96 (#1628)
Fix a test flake (#1626)
Bitpack per-block slot data in memory (#1625)
Use `div_ceil` instead of `(x + 7) / 8` (#1624)
Add repair server dynamometer (#1618)

Propolis changes are:
Paper over minor() differences
Update crucible (#886)
Test oximeter metrics end to end (#855)
server: improve behavior of starting VMs that are waiting for Crucible
activation (#873) server: extend API request queue's memory of prior
requests (#880) Use production propolis-server flags in PHD CI builds
(#878)

---------

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants