Skip to content

update to steno 0.2.0 #1532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Aug 4, 2022
Merged

update to steno 0.2.0 #1532

merged 40 commits into from
Aug 4, 2022

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Aug 2, 2022

Summary

This change updates Omicron to the in-progress Steno 0.2.0. Steno 0.2 eliminates saga templates and lets consumers customize the saga graph each time the saga is run. It also allows inserting subsagas into sagas. These together enable things like #812. Landing this change is blocked on oxidecomputer/steno#29. That change is just about ready to land.

Details

The upside of Steno 0.2 is the ability to run arbitrary saga graphs. The cost is that the set of actions is now fixed at build time and each action must be registered. This created a small challenge in the instance-create saga where we use a for loop to stamp out N similar versions of the same action to do things like create N network interfaces. Ironically, this was done because of the problem that Steno is now fixing. I've made this work here, but there's room for cleaning this up further (see the comment in the source).

As part of this change, it was convenient to create a more first-class mechanism for registering the various sagas that Nexus runs.

In terms of the mechanical changes to each of the four sagas:

  • The actions are defined with lazy_static! so that they can be easily referenced (with static type checking) from the code that builds both the DAG and the action registry.
  • There's now a unit struct that impls a NexusSaga trait that defines everything that's needed to stamp out a new saga, given parameters. This comprises the saga's name, params type, a function that registers its actions, and a function that produces a new saga given some params.
  • The saga params type is no longer part of the saga type in Steno 0.2, so all the calls to saga_params() are now fallible.
  • As a result, all of the Nexus sagas have the same saga type. So the action functions all accept the same ActionContext.

There were a few other small issues fixed here.

Blockers / next steps

This change depends on oxidecomputer/steno#29. My plan is:

@@ -63,24 +67,22 @@ impl super::Nexus {
}

/// Given a saga template and parameters, create a new saga and execute it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The docstring still refers to a "saga template", but IIUC templates are a thing of the past.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Fixed in 96dba63. I also searched for other references and fixed them.

interface_id,
prs,
)
.await
}
},
}
}

/// Delete one network interface, by index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here, deleting by index, is no longer accurate, I think. It's done by primary key / ID.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Fixed in 96dba63.

}
}

/// Create one custom (non-default) network interfaces for the provided instance.
/// Create one custom (non-default) network interfaces for the provided
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original error, but spelling: interfaces -> interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 96dba63.

Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, the Steno updates are really nice! I've a few nits around comments, but overall LGTM.

@davepacheco
Copy link
Collaborator Author

With this change approved, oxidecomputer/steno#29 has landed. I've updated this change to point steno back to "main" and this is ready to go too. Thanks!

@davepacheco davepacheco marked this pull request as ready for review August 3, 2022 21:35
@davepacheco davepacheco enabled auto-merge (squash) August 3, 2022 21:36
@davepacheco
Copy link
Collaborator Author

This is blocked on the build problem that I haven't yet figured out how to resolve. Summary (discussed in oxide-rust chat):

  • The root of the omicron repo is a workspace containing various crates. Among those crates is omicron-common.
  • Some of these crates depend on each other. They generally use Cargo "path" dependencies.
  • omicron-common depends on "steno" using a git dependency on branch "main". At all times, the repo lockfile specifies a particular commit from that branch.
  • A recent update to steno (which I'm trying to pull in with this PR) is a breaking change. The version has bumped accordingly from 0.1.0 to 0.2.0. (The Cargo docs say these are considered incompatible versions, which is good for our purposes.)
  • A different top-level crate in the omicron workspace is omicron-sled-agent. It depends on a crate "crucible" using a git dependency with a specific git revision. This package depends on omicron-common via a git dependency on branch "main".
  • This means that omicron's Cargo.lock contains two copies of omicron-common and some related packages: one from the local repo and one from git. This is not ideal but does (mostly) work.
  • Right now, omicron branch "main" refers to steno 0.1.0 (pre-breaking-change). A top level cargo test works fine.

Despite #1537, it seems like this could work correctly if Cargo.lock said:

  • the in-repo omicron-common depends on steno 0.2.0 from git sha $latest
  • the git-source omicron-common depends on steno 0.1.0 from git sha $older

That's not what Cargo.lock says, though. In the PR branch, all references in Cargo.lock refer to steno 0.2.0 from git sha $latest. (I don't have the full history here but I suspect I was overzealous in running cargo update -p steno while I was working on this PR because I didn't know there was a tree of dependencies that I'd not want to update.) The net result is that we build both the in-repo omicron-common and the git-source omicron-common against steno 0.2.0, and the git-source one doesn't build because of course it's from "main" and hasn't been updated yet.

@davepacheco davepacheco disabled auto-merge August 3, 2022 23:15
@davepacheco
Copy link
Collaborator Author

Okay, I think I've resolved the build problem but I'd love a +1 on my solution. There's a bunch of noise so I recommend looking at this entire range or maybe the Cargo.toml/Cargo.lock changes from "main". The most interesting bit is the patch section I added to the workspace Cargo.toml:

https://github.com/oxidecomputer/omicron/pull/1532/files?manifests=true&show-viewed-files=true&file-filters%5B%5D=#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542R92-R112

The comment should explain the reasoning. As I understand it, once this change lands on "main", we can immediately remove it.

This problem would also be resolved by #1537. More generally, it seems really fraught to have multiple repos that contain crates with circular dependencies of any kind (assuming the intent is that you can cargo build the whole workspace -- I realize that's not true for all repos, like hubris).

@andrewjstone
Copy link
Contributor

Okay, I think I've resolved the build problem but I'd love a +1 on my solution. There's a bunch of noise so I recommend looking at this entire range or maybe the Cargo.toml/Cargo.lock changes from "main". The most interesting bit is the patch section I added to the workspace Cargo.toml:

https://github.com/oxidecomputer/omicron/pull/1532/files?manifests=true&show-viewed-files=true&file-filters%5B%5D=#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542R92-R112

The comment should explain the reasoning. As I understand it, once this change lands on "main", we can immediately remove it.

This problem would also be resolved by #1537. More generally, it seems really fraught to have multiple repos that contain crates with circular dependencies of any kind (assuming the intent is that you can cargo build the whole workspace -- I realize that's not true for all repos, like hubris).

Huge +1 to this. Let's get this code in and then revert the patch as you propose. And yeah, I'm a firm believer in the solution in #1537 and the strategy there in general.

Thanks again for doing this migration and all the subsequent shepherding @davepacheco. This is exciting!

@davepacheco davepacheco merged commit 7a904e3 into main Aug 4, 2022
@davepacheco davepacheco deleted the steno-update branch August 4, 2022 02:18
davepacheco added a commit that referenced this pull request Aug 4, 2022
leftwo pushed a commit that referenced this pull request Nov 19, 2024
No Propolis changes other than to update Crucible

Crucible changes are:
Add debug/timeout to test_memory.sh (#1563)
Consolidate ack checking (#1561)
Rename for crutest: RegionInfo -> DiskInfo (#1562)
Fix dtrace system level scripts (#1560)
Remove `ackable_work`; ack immediately instead (#1552)
No more New jobs, no more New jobs column (#1559)
Remove delay-based backpressure in favor of explicit queue limits (#1515)
Only send flushes when Downstairs is idle; send Barrier otherwise (#1505)
Update Rust crate reqwest to v0.12.9 (#1536)
Update Rust crate omicron-zone-package to 0.11.1 (#1535)
Remove separate validation array (#1522)
Remove more unnecessary `DsState` variants (#1550)
Consolidate `DownstairsClient::reinitialize` (#1549)
Update Rust crate uuid to v1.11.0 (#1546)
Update Rust crate reedline to 0.36.0 (#1544)
Update Rust crate bytes to v1.8.0 (#1541)
Update Rust crate thiserror to v1.0.66 (#1539)
Update Rust crate serde_json to v1.0.132 (#1538)
Update Rust crate serde to v1.0.214 (#1537)
Remove transient states in `DsState` (#1526)
Update Rust crate libc to v0.2.161 (#1534)
Update Rust crate futures to v0.3.31 (#1532)
Update Rust crate clap to v4.5.20 (#1531)
Update Rust crate async-trait to 0.1.83 (#1530)
Update Rust crate anyhow to v1.0.92 (#1529)
Remove obsolete crutest perf test (#1528)
Update dependency rust to v1.82.0 (#1512)
Still more updates to support Volume layer activities. (#1508)
Remove remaining IOPS/bandwidth limiting code (#1525)
Add unit test for VersionMismatch (#1524)
Removing panic paths by only destructuring once (#1523)
Update actions/checkout digest to 11bd719 (#1518)
Switch to using `Duration` for times (#1520)
leftwo added a commit that referenced this pull request Nov 20, 2024
No Propolis changes other than to update Crucible

Crucible changes are:
Add debug/timeout to test_memory.sh (#1563)
Consolidate ack checking (#1561)
Rename for crutest: RegionInfo -> DiskInfo (#1562) Fix dtrace system
level scripts (#1560)
Remove `ackable_work`; ack immediately instead (#1552) No more New jobs,
no more New jobs column (#1559)
Remove delay-based backpressure in favor of explicit queue limits
(#1515) Only send flushes when Downstairs is idle; send Barrier
otherwise (#1505) Update Rust crate reqwest to v0.12.9 (#1536)
Update Rust crate omicron-zone-package to 0.11.1 (#1535) Remove separate
validation array (#1522)
Remove more unnecessary `DsState` variants (#1550) Consolidate
`DownstairsClient::reinitialize` (#1549) Update Rust crate uuid to
v1.11.0 (#1546)
Update Rust crate reedline to 0.36.0 (#1544)
Update Rust crate bytes to v1.8.0 (#1541)
Update Rust crate thiserror to v1.0.66 (#1539)
Update Rust crate serde_json to v1.0.132 (#1538)
Update Rust crate serde to v1.0.214 (#1537)
Remove transient states in `DsState` (#1526)
Update Rust crate libc to v0.2.161 (#1534)
Update Rust crate futures to v0.3.31 (#1532)
Update Rust crate clap to v4.5.20 (#1531)
Update Rust crate async-trait to 0.1.83 (#1530)
Update Rust crate anyhow to v1.0.92 (#1529)
Remove obsolete crutest perf test (#1528)
Update dependency rust to v1.82.0 (#1512)
Still more updates to support Volume layer activities. (#1508) Remove
remaining IOPS/bandwidth limiting code (#1525) Add unit test for
VersionMismatch (#1524)
Removing panic paths by only destructuring once (#1523) Update
actions/checkout digest to 11bd719 (#1518)
Switch to using `Duration` for times (#1520)

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants