Bootstore message handler implementation #1687

andrewjstone · 2022-09-08T06:46:58Z

This is the second PR of the bootstore implementation. It creates handler methods for all existing messages and their corresponding DB requirements. RFD 238 section 5 will need to be updated to reflect this flow.

The way the flow currently is supposed to work is that there is a coordinator (coming in a follow up PR) that sends the following messages to initialize the trust quorum:

Initialize
KeyShareCommit (for epoch 0)

Initialize can be sent multiple times until the KeyShareCommit for epoch 0 locks in the configuration and Rack UUID. This allows the initial coordinator (RSS) to die, and even have its local storage or entire sled die, and still allow us to make progress via a new RSS. Importantly, once wired through the system, the coordinator will negotiate (via sled-agent and nexus) to write the KeyShareCommit to CockroachDB before sending it to all nodes. This ensures that the rack initialization will complete and CockroachDB only has to be setup once. There are some tricky failure modes here, but none of that is in this PR, so we can move on. The important point here is that once a KeyShareCommit for epoch 0 succeeds the rack is initialized.

After initialization, reconfiguration of the trust quorum can take place. Reconfiguration messages must be issued with the same Rack UUID. A reconfiguration consists of a KeySharePrepare for an epoch > 0 followed by a KeyShareCommit for the same epoch. After initialization a KeySharePrepare can never be mutated. However, to allow for certain failures, such as the failure of a node being prepared to ack (since we require unanimous acknowledgement in 2PC), we allow for new KeySharePrepares at higher epochs to proceed. We keep track of these epochs in CockroachDB to maintain linearizability. Once a node receives a prepare for a higher epoch, it will no longer accept prepare or commit messages for any lower epoch. Once a successful KeyShareCommit is handled, the sled is now operating at the new epoch.

GetShare requests are used to unlock trust quorum. They will rely on sprockets at the network level for authentication, confidentiality, integrity, and attestation. Some DeviceId membership validation will also be added to Nodein future commits as it currently exists in sled agent.

This is an incremental PR and there is still a lot to do here. Follow up commits will:

Add more negative tests (possibly in this PR also)
Add code for a coordinator
Add a mechanism to store the encrypted (wrapped) root secret for each epoch and likely include it in each KeySharePrepare. This is the most straightforward way to transfer wrapped secrets. We can get more sophisticated in the future. Again, this needs to be written up in RFD 238, but I'm likely to just implement it within the next week and see how it works first.
Add some key derivation from the root secret such that we can use these keys to encrypt and decrypt storage
Add property based tests to simulate a full cluster along with failures and ensure rack unlock works according to the model and ensure that we can always unlock encrypted storage while tolerating our prescribed failure conditions within the model.

After that, the bootstore will basically be implemented at a non-network level. We'll then likely want to port over the sprockets code from the bootstrap agent to a higher level here, and use the Node and Coordinator from that level, which will end up interacting with Nexus and CockroachDB via the sled-agent. Multiple tasks will be utilizing the Node and its underlying Db at this point and so we'll need some concurrency control. I've given this a bit of thought and my current thinking is to have the Node running in it's own thread and serialize all operation via a channel. Outside of reconfiguration, all operations are reads, and we can simply cache the current share in memory for rack unlock purposes (e.g. GetShare). During reconfiguration, we'll mutate the DB as necessary, and refresh our cache. This strategy avoids the need for mutexes and has the characteristic that the Node and Db code we run in a single threaded manner in property based tests is actually running the same way in production with just a single DB connection. This strategy isn't set in stone, but in my mind it's the easiest to reason about and also the easiest way to hook async to sync code.

I tagged @jclulow here because he's a SQLite maven, and wanted a quick look to see if I'm doing things in a sane manner. Any tips would be appreciated. I expect @jgallagher to be my primary logic reviewer for the rest of this :)

bootstore/src/db/mod.rs

andrewjstone · 2022-09-08T07:02:19Z

bootstore/src/db/mod.rs

+        conn.immediate_transaction(|tx| {
+            // Does the rack_uuid match what's stored?
+            self.validate_rack_uuid(tx, rack_uuid)?;
+


I know I wrote in the PR description that we should not accept a Commit if there is even a prepare for a later epoch. We don't do that here. I'm not actually sure if we should. For instance, is it possible that a node was prepared, offline for the commit, then somehow got a later prepare that included itself? That seems unlikely and maybe we can prevent it at the nexus layer. However, if we don't prevent it, we may still need the commit to allow the sled to unlock itself while a new prepare is in progress. I need to think about this some more.

I"m leaning toward actually allowing this. I'm not sure it's harmful. I will continue to think about it as this protocol evolves.

bootstore/src/db/mod.rs

jgallagher

LGTM 👍

bootstore/src/db/error.rs

bootstore/src/db/mod.rs

jgallagher · 2022-09-08T19:39:24Z

bootstore/src/db/mod.rs

-        &mut self,
+        &self,
+        conn: &mut SqliteConnection,
+        rack_uuid: &Uuid,
        epoch: i32,


Can epochs be negative?

NOOOOOOOOOO. However, the type of INTEGER in sqlite is an i32, and even BIGINT is just i64. Rather than forcing lossy conversion I kept the types the same. I'm definitely open to changing this although it shouldn't be much of a problem here since we only start with epoch 0 and reject updates for earlier epochs.

I'm tempted to suggest making epoch a u32 in all the public APIs of this crate; the fact that SQLite forces us into i32 is bleeding out here. But if that's painful I don't think it's a huge deal given the checks.

I'll think about it :)

I'm not actually sure it's better to use a u32 given the constraints. I'm gonna leave it for now just to get this merged, but will keep it in mind.

bootstore/src/db/mod.rs

bootstore/src/messages.rs

bootstore/src/node.rs

Crucible changes are: Make `ImpactedBlocks` an absolute range (#1685) update omicron (#1687) Update Rust crate chrono to v0.4.40 (#1682) Update Rust crate tempfile to v3.19.0 (#1676) Log loops during crutest replay (#1674) Refactor live-repair start and send (#1673) Update Rust crate libc to v0.2.171 (#1684) Update Rust crate clap to v4.5.32 (#1683) Update Rust crate async-trait to 0.1.88 (#1680) Update Rust crate bytes to v1.10.1 (#1681) Update Rust crate anyhow to v1.0.97 (#1679) Update Rust crate http to v1 (#1678) Update Rust crate tokio-util to v0.7.14 (#1675) Update Rust crate thiserror to v2 (#1657) Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust crate opentelemetry to 0.28.0 (#1543) Update Rust crate itertools to 0.14.0 (#1646) Update Rust crate clearscreen to v4 (#1656) nightly test polish (#1665) Update Rust crate strum_macros to 0.27 (#1654) Update Rust crate strum to 0.27 (#1653) Update Rust crate rusqlite to 0.34 (#1652) Better return semantics from should_send Trying to simplify enqueue Remove unnecessary argument Remove unused code from `ImpactedBlocks` (#1668) Log when pantry/volume layers activate things (#1670) Fix unused import (#1669) Use `RangeSet` instead of tracking every complete job (#1663) Update Rust crate dropshot to 0.16.0 (#1645) Simplify pending work tracking (#1662) Remove rusqlite dependency from crucible-common (#1659) Update Rust crate reedline to 0.38.0 (#1651) Update Rust crate proptest to 1.6.0 (#1648) Update Rust crate bytes to v1.10.0 (#1644) Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate tracing to v0.1.41 (#1642) Update Rust crate toml to v0.8.20 (#1641) Update Rust crate thiserror to v1.0.69 (#1640) Update Rust crate serde_json to v1.0.139 (#1639) Update Rust crate serde to v1.0.218 (#1638) Update Rust crate reqwest to v0.12.12 (#1637) Update Rust crate libc to v0.2.170 (#1636) Update Rust crate indicatif to 0.17.11 (#1635) Update Rust crate httptest to 0.16.3 (#1634) Update Rust crate clap to v4.5.31 (#1632) Update Rust crate chrono to v0.4.39 (#1631) Update Rust crate byte-unit to 5.1.6 (#1630) Update Rust crate async-trait to 0.1.86 (#1629) Update Rust crate csv to 1.3.1 (#1633) Update Rust crate anyhow to v1.0.96 (#1628) Fix a test flake (#1626) Bitpack per-block slot data in memory (#1625) Use `div_ceil` instead of `(x + 7) / 8` (#1624) Add repair server dynamometer (#1618) Propolis changes are: Paper over minor() differences Update crucible (#886) Test oximeter metrics end to end (#855) server: improve behavior of starting VMs that are waiting for Crucible activation (#873) server: extend API request queue's memory of prior requests (#880) Use production propolis-server flags in PHD CI builds (#878) Added a new field to the Board struct that propolis requires.

Crucible changes are: Make `ImpactedBlocks` an absolute range (#1685) update omicron (#1687) Update Rust crate chrono to v0.4.40 (#1682) Update Rust crate tempfile to v3.19.0 (#1676) Log loops during crutest replay (#1674) Refactor live-repair start and send (#1673) Update Rust crate libc to v0.2.171 (#1684) Update Rust crate clap to v4.5.32 (#1683) Update Rust crate async-trait to 0.1.88 (#1680) Update Rust crate bytes to v1.10.1 (#1681) Update Rust crate anyhow to v1.0.97 (#1679) Update Rust crate http to v1 (#1678) Update Rust crate tokio-util to v0.7.14 (#1675) Update Rust crate thiserror to v2 (#1657) Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust crate opentelemetry to 0.28.0 (#1543) Update Rust crate itertools to 0.14.0 (#1646) Update Rust crate clearscreen to v4 (#1656) nightly test polish (#1665) Update Rust crate strum_macros to 0.27 (#1654) Update Rust crate strum to 0.27 (#1653) Update Rust crate rusqlite to 0.34 (#1652) Better return semantics from should_send Trying to simplify enqueue Remove unnecessary argument Remove unused code from `ImpactedBlocks` (#1668) Log when pantry/volume layers activate things (#1670) Fix unused import (#1669) Use `RangeSet` instead of tracking every complete job (#1663) Update Rust crate dropshot to 0.16.0 (#1645) Simplify pending work tracking (#1662) Remove rusqlite dependency from crucible-common (#1659) Update Rust crate reedline to 0.38.0 (#1651) Update Rust crate proptest to 1.6.0 (#1648) Update Rust crate bytes to v1.10.0 (#1644) Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate tracing to v0.1.41 (#1642) Update Rust crate toml to v0.8.20 (#1641) Update Rust crate thiserror to v1.0.69 (#1640) Update Rust crate serde_json to v1.0.139 (#1639) Update Rust crate serde to v1.0.218 (#1638) Update Rust crate reqwest to v0.12.12 (#1637) Update Rust crate libc to v0.2.170 (#1636) Update Rust crate indicatif to 0.17.11 (#1635) Update Rust crate httptest to 0.16.3 (#1634) Update Rust crate clap to v4.5.31 (#1632) Update Rust crate chrono to v0.4.39 (#1631) Update Rust crate byte-unit to 5.1.6 (#1630) Update Rust crate async-trait to 0.1.86 (#1629) Update Rust crate csv to 1.3.1 (#1633) Update Rust crate anyhow to v1.0.96 (#1628) Fix a test flake (#1626) Bitpack per-block slot data in memory (#1625) Use `div_ceil` instead of `(x + 7) / 8` (#1624) Add repair server dynamometer (#1618) Propolis changes are: Paper over minor() differences Update crucible (#886) Test oximeter metrics end to end (#855) server: improve behavior of starting VMs that are waiting for Crucible activation (#873) server: extend API request queue's memory of prior requests (#880) Use production propolis-server flags in PHD CI builds (#878) --------- Co-authored-by: Alan Hanson <alan@oxide.computer>

andrewjstone added 10 commits September 7, 2022 01:15

wip

412a6af

start implementing node methods

6c474d5

Add rack table and allow multiple connections

827ee53

Add support for rack init

a2240e8

Add test for rack initialization

735d902

Implement KeySharePrepare handling

16bc051

Better error wrapping

3171843

more specific error handling in node test

9de3249

key_share_commit implemnented and positive test for reconfiguration

3ea0e6e

Add some negative tests

7e139c2

andrewjstone requested review from jclulow and jgallagher September 8, 2022 06:46

andrewjstone commented Sep 8, 2022

View reviewed changes

bootstore/src/db/mod.rs Outdated Show resolved Hide resolved

andrewjstone changed the title ~~Bootstore 3~~ Bootstore message handler implementation Sep 8, 2022

jgallagher approved these changes Sep 8, 2022

View reviewed changes

andrewjstone added 5 commits September 9, 2022 17:47

Parse UUID strings in DB layer and some other small fixes

b84d120

clippy clean

0b6f4d2

Db now contains the one true connection

fe3f84a

Add trigger to ensure only one rack row is present

a55e683

remove implemented TODO comment

3336ea4

andrewjstone enabled auto-merge (squash) September 9, 2022 22:05

cleanup after myself

39a7ca6

andrewjstone merged commit 53c732a into main Sep 9, 2022

andrewjstone deleted the bootstore-3 branch September 9, 2022 23:46

leftwo mentioned this pull request Mar 26, 2025

Update crucible and propolis #7877

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bootstore message handler implementation #1687

Bootstore message handler implementation #1687

Uh oh!

andrewjstone commented Sep 8, 2022

Uh oh!

Uh oh!

andrewjstone Sep 8, 2022

Uh oh!

andrewjstone Sep 9, 2022

Uh oh!

Uh oh!

jgallagher left a comment

Uh oh!

Uh oh!

Uh oh!

jgallagher Sep 8, 2022

Uh oh!

andrewjstone Sep 9, 2022

Uh oh!

jgallagher Sep 9, 2022

Uh oh!

andrewjstone Sep 9, 2022

Uh oh!

andrewjstone Sep 9, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Bootstore message handler implementation #1687

Bootstore message handler implementation #1687

Uh oh!

Conversation

andrewjstone commented Sep 8, 2022

Uh oh!

Uh oh!

andrewjstone Sep 8, 2022

Choose a reason for hiding this comment

Uh oh!

andrewjstone Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgallagher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jgallagher Sep 8, 2022

Choose a reason for hiding this comment

Uh oh!

andrewjstone Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

jgallagher Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

andrewjstone Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

andrewjstone Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!