Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockroach starter could better report early termination #1145

Merged
merged 1 commit into from
Jun 10, 2022

Conversation

davepacheco
Copy link
Collaborator

While debugging #1144, I found it would have been helpful if the test suite had reported exactly how cockroach had exited. This change adds more detail.

@davepacheco
Copy link
Collaborator Author

For reference, I applied this patch to cause cockroach to exit early and see how it would improve the output:

dap@ivanova omicron-fixes $ git diff
diff --git a/test-utils/src/dev/db.rs b/test-utils/src/dev/db.rs
index 2702d56b..d396d143 100644
--- a/test-utils/src/dev/db.rs
+++ b/test-utils/src/dev/db.rs
@@ -173,6 +173,9 @@ impl CockroachStarterBuilder {
         // directory rather than a random one.  That way, we can warn the user
         // if they start up two of them, and we can also clean up after unclean
         // shutdowns.
+        if self.store_dir.is_none() {
+            self.arg("bogus");
+        }
         let temp_dir =
             tempdir().with_context(|| "creating temporary directory")?;
         let store_dir = self

and I got this error:

test integration_tests::disks::test_disk_create_disk_that_already_exists_fails ... FAILED

failures:

---- integration_tests::disks::test_disk_create_disk_that_already_exists_fails stdout ----
log file: "/dangerzone/omicron_tmp/test_all-b48593a0eb0ea9a0-test_disk_create_disk_that_already_exists_fails.4953.0.log"
note: configured to log to "/dangerzone/omicron_tmp/test_all-b48593a0eb0ea9a0-test_disk_create_disk_that_already_exists_fails.4953.0.log"
thread 'integration_tests::disks::test_disk_create_disk_that_already_exists_fails' panicked at 'called `Result::unwrap()` on an `Err` value: Exited { exit_code: 1 }', /home/dap/omicron-fixes/test-utils/src/dev/mod.rs:142:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

which is much better than what we saw in #1144 because it explicitly says that we exited with status 1.

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

child_process.try_wait().unwrap().is_some()
/// fall into the first case. It's not clear under what conditions this
/// function could ever fail. It's not clear from the source that it's even
/// possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like tokio ultimately calls try_wait from std, which calls waitpid with WNOHANG. Per the manpage of waitpid (at least on illumos), it doesn't seem possible for any of the error cases to trigger (absent some weird bug where the process doesn't have the right pid or something like that), at least assuming we can't get EINTR if we passed WNOHANG (which I'm not sure about?).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't remember the details (this comment didn't actually change here, it's pretty old) but I came to a similar conclusion and left this vague note.

@davepacheco
Copy link
Collaborator Author

Thanks for the review!

@davepacheco davepacheco merged commit 6005aff into main Jun 10, 2022
@davepacheco davepacheco deleted the report-signals branch June 10, 2022 18:21
leftwo pushed a commit that referenced this pull request Feb 9, 2024
Crucible changes:
Remove unused fields in IOop (#1149)
New downstairs clone subcommand. (#1129)
Simplify the do_work_task loop (#1150)
Move `Guest` stuff into a module (#1125)
Bump nix to 0.27.1 and use new safer Fd APIs (#1110)
Move `FramedWrite` work to a separate task (#1145)
Use fewer borrows in ExtentInner API (#1147)
Update Rust crate reedline to 0.28.0 (#1141)
Update Rust crate tokio to 1.36 (#1143)
Update Rust crate slog-bunyan to 2.5.0 (#1139)
Update Rust crate rayon to 1.8.1 (#1138)
Update Rust crate itertools to 0.12.1 (#1137)
Update Rust crate byte-unit to 5.1.4 (#1136)
Update Rust crate base64 to 0.21.7 (#1135)
Update Rust crate async-trait to 0.1.77 (#1134)
Discard deferred msgs (#1131)
Minor Downstairs cleanup (#1127)
Update test_fail_live_repair to support pstop (#1128)
Ignore client messages after stopping the IO task (#1126)
Move client IO task into a struct (#1124)
Bump Rust to 1.75 and fix new Clippy lints (#1123)

Propolis changes:
PHD: convert to async (#633)
PHD: assume specialized Windows images (#636)
propolis-standalone-config needn't be a crate
standalone: Use tar for snapshot/restore
phd: use latest "lab-2.0-opte" target, not a specific version (#637)
PHD: add tests for migration of running processes (#623)
PHD: fix `cargo xtask phd` tidy not doing anything (#630)
PHD: add documentation for `cargo xtask phd` (#629)
standalone: improve virtual device creation errors (#632)
phd: add Windows Server 2019 guest adapter (#627)
PHD: add `cargo xtask phd` to make using PHD nicer (#619)
leftwo added a commit that referenced this pull request Feb 9, 2024
Crucible changes:
Remove unused fields in IOop (#1149)
New downstairs clone subcommand. (#1129)
Simplify the do_work_task loop (#1150)
Move `Guest` stuff into a module (#1125)
Bump nix to 0.27.1 and use new safer Fd APIs (#1110) Move `FramedWrite`
work to a separate task (#1145) Use fewer borrows in ExtentInner API
(#1147)
Update Rust crate reedline to 0.28.0 (#1141)
Update Rust crate tokio to 1.36 (#1143)
Update Rust crate slog-bunyan to 2.5.0 (#1139)
Update Rust crate rayon to 1.8.1 (#1138)
Update Rust crate itertools to 0.12.1 (#1137)
Update Rust crate byte-unit to 5.1.4 (#1136)
Update Rust crate base64 to 0.21.7 (#1135)
Update Rust crate async-trait to 0.1.77 (#1134)
Discard deferred msgs (#1131)
Minor Downstairs cleanup (#1127)
Update test_fail_live_repair to support pstop (#1128) Ignore client
messages after stopping the IO task (#1126) Move client IO task into a
struct (#1124)
Bump Rust to 1.75 and fix new Clippy lints (#1123)

Propolis changes:
PHD: convert to async (#633)
PHD: assume specialized Windows images (#636)
propolis-standalone-config needn't be a crate
standalone: Use tar for snapshot/restore
phd: use latest "lab-2.0-opte" target, not a specific version (#637)
PHD: add tests for migration of running processes (#623) PHD: fix `cargo
xtask phd` tidy not doing anything (#630) PHD: add documentation for
`cargo xtask phd` (#629) standalone: improve virtual device creation
errors (#632) phd: add Windows Server 2019 guest adapter (#627)
PHD: add `cargo xtask phd` to make using PHD nicer (#619)

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants