-
Notifications
You must be signed in to change notification settings - Fork 45
CI: End-to-end instance launch #1608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
I haven't taken a close look at the test but I like the approach of using a non-default workspace crate for running these tests. It's been in the back of my mind how we'd run more complicated integration tests like this in a way that didn't trip up a standard cargo test
, but were still well integrated into the workflows and tooling that people expect (i.e., still runnable by cargo test
). This looks like a good way to do that.
1883391
to
8da2f10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet this is awesome! It'll be great to have actual end-to-end tests now in addition to the sled-agent-sim ones! Left a couple small comments but otherwise looks good to me
http://catacomb.eng.oxide.computer:12346/tcpproxy | ||
pfexec chmod +x /usr/oxide/tcpproxy | ||
pfexec rm -f /var/svc/manifest/site/tcpproxy.xml | ||
pfexec curl -sSfL -o /var/svc/manifest/site/tcpproxy.xml \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just include tcpproxy.xml
in the omicron repo so that we don't run the risk of a change made to the remote copy breaking CI with no in-repo changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the moment I'm avoiding this because I'd rather have the tcpproxy binary and its service manifest in sync in the same place. (But I'm making a note to eventually build tcpproxy in CI perhaps?)
// TODO: not really sure about a good heuristic for selecting an IP address | ||
// range here. in both my (iliana's) environment and the lab, the last octet | ||
// is 20; in my environment the DHCP range is 100-249, and in the buildomat | ||
// lab environment the network is currently private. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Carving some space out of the same subnet as Nexus seems alright for now. Besides Nexus (and the global zone) was there anything else we'd conflict with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly worried about home network setups for people running these tests locally. I don't think there's a whole lot I can do there though.
(I've idly been wondering if we should have an out-of-repo config we look for describing a user's network setup.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've idly been wondering if we should have an out-of-repo config we look for describing a user's network setup.)
Yes! I think that or even just something we stick in the .gitignore. Since the configs are part the repo today any modifications you make for your local testing (e.g. mac gateway) pollute git status.
Before I merge, I'm looking into why, after a merge, the tests are now failing locally (bootstrap is timing out) and in CI. |
8da2f10
to
35d8ec6
Compare
I've been working on a series of end-to-end tests which can drive the console against a live instance of Nexus. Once this is in, I'd like to setup a GitHub action to drive those tests against these instances. It's a bit of a long-tail in terms of CI and not something you'd want to wait on for development. I think it'd be helpful to run it against main at least. |
That could be interesting — ultimately the long tail in this current test is the amount of time it takes to build and package the control plane (22 min in this run), and then about 5 minutes to deploy the control plane on a lab machine. We do have to stand up the control plane on a lab machine, of which we only have two currently, so there's not a lot of concurrency there; but if we do want to run a separate lab machine that is an optional CI step to run the console end-to-end tests, we certainly could in theory. |
@@ -114,3 +116,13 @@ panic = "abort" | |||
[patch.crates-io.pq-sys] | |||
git = 'https://github.com/oxidecomputer/pq-sys' | |||
branch = "oxide/omicron" | |||
|
|||
# | |||
# libsodium-sys is a dependency of thrussh, used in `end-to-end-tests`. We |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted about this off github, but for posterity's sake:
- We wanted libsodium because it's a dependency of thrussh
- We use thrussh in the end-to-end test
- Thrussh also looks deprecated
We should at least file an issue to follow-up here, and minimize our use of unsupported deps.
d92d0f3
to
30d39df
Compare
Crucible changes are: Allow read only activation with less than three downstairs (#1608) Tweaks to automatic flush (#1613) Update Rust crate twox-hash to v2 (#1547) Remove `LastFlushAck` (#1603) Correctly print 'connecting' state (#1612) Make live-repair part of invariants checks (#1610) Simplify mend region selection (#1606) Generic read test for crutest (#1609) Always remove skipped jobs from dependencies (#1604) Add libsqlite3-dev install step to Github Actions CI (#1607) Move Nexus notification to standalone task (#1584) DTrace cleanup. (#1602) Reset completed work Downstairs on a `Barrier` operation (#1601) Upstairs state machine refactoring (3/3) (#1577) Propolis changes are: Wire up initial support for AMD perf counters build: upgrade tokio to 1.40.0 (#836) build: explicitly install libsqlite3-dev in CI (#834) add JSON output format to cpuid-gen (#832)
Crucible changes are: Allow read only activation with less than three downstairs (#1608) Tweaks to automatic flush (#1613) Update Rust crate twox-hash to v2 (#1547) Remove `LastFlushAck` (#1603) Correctly print 'connecting' state (#1612) Make live-repair part of invariants checks (#1610) Simplify mend region selection (#1606) Generic read test for crutest (#1609) Always remove skipped jobs from dependencies (#1604) Add libsqlite3-dev install step to Github Actions CI (#1607) Move Nexus notification to standalone task (#1584) DTrace cleanup. (#1602) Reset completed work Downstairs on a `Barrier` operation (#1601) Upstairs state machine refactoring (3/3) (#1577) Propolis changes are: Wire up initial support for AMD perf counters build: upgrade tokio to 1.40.0 (#836) build: explicitly install libsqlite3-dev in CI (#834) add JSON output format to cpuid-gen (#832) --------- Co-authored-by: Alan Hanson <alan@oxide.computer>
As demoed Friday, here's a test that launches a Debian instance in lab CI and successfully SSHes into it. There's hopefully enough infrastructure here to enable others to write lab tests as well; some things could definitely be moved into
mod helpers
but I'd like to avoid trying to predict what needs to be moved into helper functions.Please also review oxidecomputer/sodiumoxide@c2546d1, which are changes to libsodium-sys's build.rs to support illumos. Once this lands I will
try to send that upstream so we can stop maintaining a fork(welp, they archived the upstream repo). (libsodium-sys is a required component of thrussh, the library we are using as an SSH client in these tests so that we don't have to instrument the OpenSSH client for tests.)