Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve display of Unix wait statuses, notably giving names for signals #82974

Closed
wants to merge 11 commits into from
68 changes: 68 additions & 0 deletions library/std/src/sys/unix/ext/process.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ use crate::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, RawFd};
use crate::process;
use crate::sealed::Sealed;
use crate::sys;
use crate::sys::os::{self, SignalLookupMethod};
use crate::sys_common::{AsInner, AsInnerMut, FromInner, IntoInner};

/// Unix-specific extensions to the [`process::Command`] builder.
Expand Down Expand Up @@ -319,3 +320,70 @@ impl IntoRawFd for process::ChildStderr {
pub fn parent_id() -> u32 {
crate::sys::os::getppid()
}

/// Converts a signal number to a string representation suitable for a human reader.
///
/// This does not necessarily produce the same output as the C `strsignal` function.
/// Currently there are two deviations from the return value of `strsignal`:
///
/// * `signal_describe` does not translate the string according to the locale; C `strsignal`
/// usually does. Unfortunately a translated version is not currently available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of Rust's standard library functions currently include any kind of translation; this function is not unusual in that regard. (The first sentence, stating that it isn't translated, is helpful; the "Unfortunately" could be attached to many other functions, and is a known limitation of std.)

///
/// * `signal_describe` will include the signal abbrevation (eg, `SIGINT`) if
/// the platform C library can provide that.
///
/// The precise form of the output should not be relied on.
///
/// # Examples
///
/// ```
/// #![feature(unix_signal_strings)]
/// use std::os::unix::process::signal_describe;
/// assert!(signal_describe(9).contains("Killed"));
/// ```
#[unstable(feature = "unix_signal_strings", issue = "none")]
pub fn signal_describe(sig: i32) -> String {
let mut buf = String::new();
os::signal_display(&mut buf, sig).unwrap();
buf
}

/// Looks up a signal number to get a basic string representation.
///
/// For known signals, this will typically give the same output as the C `strisignal`
/// function does in the `C` locale.
///
/// Will return `None` for unknown or invalid signal numbers.
///
/// # Examples
///
/// ```
/// #![feature(unix_signal_strings)]
/// use std::os::unix::process::signal_string_raw;
/// assert_eq!(signal_string_raw(15), Some("Terminated"));
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
/// ```
#[unstable(feature = "unix_signal_strings", issue = "none")]
pub fn signal_string_raw(sig: i32) -> Option<&'static str> {
os::signal_lookup::descrs.lookup(sig)
}

/// Looks up a signal number to get its abbreviation.
///
/// When this succeeds, it will return the part of the signal name after `SIG`.
///
/// On some systems, `signal_abbrev` is not supported and will always return `None`. Whether it is
/// supported is not known at compile time; it can depend on the libc version found at runtime.
///
/// # Examples
///
/// ```
/// #![feature(unix_signal_strings)]
/// use std::os::unix::process::signal_abbrev;
/// if let Some(got) = signal_abbrev(1) {
/// assert_eq!(got, "HUP");
/// }
/// ```
#[unstable(feature = "unix_signal_strings", issue = "none")]
pub fn signal_abbrev(sig: i32) -> Option<&'static str> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit: avoid abbreviating words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose this name because I was copying the glibc name for the same thing (sigabbrev). I don't think calling it "abbrevation" is right. BSD calls it the signame so name would be a possibility but it seems rather generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a rule against abbreviating things? Ref ptr alloc iter mut vec env args .... SocketAddr Dir .... etc. etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not a rule per se, but my understanding is that the preference tends to lean towards names that are not abbreviated as far as the recent additions to the standard library API surface go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that the preference tends to lean towards names that are not abbreviated as far as the recent additions to the standard library API surface go.

Fair enough. I think, though, that in this case the abbreviation is definitely justified both by the prior art in glibc (which feels very natural to me as an old Unix hand) and by the clumsiness of the unabbreviated word "abbreviation" :-).

If this is a blocker for this MR then I can of course respray this corner of the bikeshed. In any case, thanks for your attention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By way of hopefully sidestepping the bikeshed about abbreviations, I think it'd be fair to call this the signal_name. I don't think we need to call it an "abbreviation" at all. (Yes, TERM is itself an abbreviation, but it's the actual name of the signal.) And it makes sense to say we're mapping a number to a "name" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I think it'd be fair to call this the signal_name. I don't think we need to call it an "abbreviation" at all. (Yes, TERM is itself an abbreviation, but it's the actual name of the signal.) And it makes sense to say we're mapping a number to a "name" here.

name is fine by me, so I will do that. (And yes the BSDs even have this in the sys_signame array so that is precedent.)

By way of hopefully sidestepping the bikeshed about abbreviations, ...

:-).

os::signal_lookup::abbrevs.lookup(sig)
}
117 changes: 117 additions & 0 deletions library/std/src/sys/unix/os.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ use crate::os::unix::prelude::*;
use crate::error::Error as StdError;
use crate::ffi::{CStr, CString, OsStr, OsString};
use crate::fmt;
use crate::fmt::Write as _;
use crate::io;
use crate::iter;
use crate::marker::PhantomData;
Expand Down Expand Up @@ -131,6 +132,122 @@ pub fn error_string(errno: i32) -> String {
}
}

// Signal strings
//
// This is not so easy. Docs references:
//
// glibc
// https://www.sourceware.org/glibc/wiki/Release/2.32
// https://www.gnu.org/software/libc/manual/html_node/Signal-Messages.html
//
// FreeBSD
// https://www.freebsd.org/cgi/man.cgi?query=strsignal&apropos=0&sektion=0&manpath=FreeBSD+12.2-RELEASE+and+Ports&arch=default&format=html

pub trait SignalLookupMethod {
fn lookup(&self, sig: i32) -> Option<&'static str>;
}

unsafe fn ptr_to_maybe_str(p: *const c_char) -> Option<&'static str> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, functions returning Option use opt or option in their name, not maybe. (The only instances of to_maybe in the Rust codebase related to a compiler-internal type named MaybeOwned.)

if p.is_null() {
None
} else {
let cstr = CStr::from_ptr::<'static>(p);
Some(cstr.to_str().unwrap())
}
}

// This is pretty universal - the nix crate makes the same assumption. We need it to be no larger
// than the platform's actual value - but, only on platforms where there is sys_siglist or
// sys_signame. This value is a pretty safe bet, and we have a test caee for it.
ijackson marked this conversation as resolved.
Show resolved Hide resolved
const NSIG: usize = 32;

#[cfg(not(target_env = "musl"))]
pub mod signal_lookup {
use super::*;
use crate::sys::weak::Weak;

type SigFooNp = unsafe extern "C" fn(c_int) -> *const c_char;

#[repr(transparent)]
struct SysSigFoos(*const *const c_char);
unsafe impl Sync for SysSigFoos {}

pub struct Lookups {
sigfoo_np: Weak<SigFooNp>,
sys_sigfoos: Weak<SysSigFoos>,
}

pub static descrs: Lookups = Lookups {
sigfoo_np: Weak::new("sigdescr_pn\0"), // glibc >=2.32
ijackson marked this conversation as resolved.
Show resolved Hide resolved
sys_sigfoos: Weak::new("sys_siglist\0"), // FeeBSD/NetBSD/OpenBSD
ijackson marked this conversation as resolved.
Show resolved Hide resolved
};

pub static abbrevs: Lookups = Lookups {
sigfoo_np: Weak::new("sigabbrev_pn\0"), // glibc >=2.32
ijackson marked this conversation as resolved.
Show resolved Hide resolved
sys_sigfoos: Weak::new("sys_signame\0"), // glibc < 2.23, BSDs, and other trad unix
};

impl SignalLookupMethod for Lookups {
fn lookup(&self, sig: i32) -> Option<&'static str> {
unsafe {
let got = if let Some(sigfoo_np) = self.sigfoo_np.get() {
sigfoo_np(sig)
} else if let Some(use_sys_foolist) =
// if let chaining would be nice here!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice in many places, but we don't normally leave these kinds of comments in the source. Once we have if let chaining, we're very likely to update many bits of code over time.

if (sig as usize) < NSIG { self.sys_sigfoos.get() } else { None }
{
use_sys_foolist.0.add(sig as usize).read()
} else {
ptr::null()
};
ptr_to_maybe_str(got)
}
}
}
}

#[cfg(arget_env = "musl")]
ijackson marked this conversation as resolved.
Show resolved Hide resolved
pub mod signal_lookup {
pub struct ViaStrsignal;
pub static descrs: ViaStrSignal = ViaStrSignal;
impl Lookup for ViaStrSignal {
// musl's strsignal is threadsafe: it just looks up in a static array (that we don't have
// access to in any other way). I have spoken to Rich Felker (musl upstream author) on IRC
// and confirmed that musl's strsignal is very unlikely to ever become non-thread-safe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to rely on, but it would be helpful to have this documented somewhere other than an ephemeral second-hand conversation, if possible. If the author of musl feels comfortable stating this, perhaps it could go in the musl documentation somewhere?

(Alternatively, perhaps musl could provide sigdescr_np as a symbol alias for strsignal? That'd be a longer wait, though. Documentation we could link to would suffice.)

It looks like glibc doesn't do this because it generates strings for unknown and real-time signals on the fly, in addition to being locale-dependent.

It looks like musl has fixed strings for all the real-time signals. Is musl's handling of real-time and unknown signals included in the above commitment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that having this commitment documented would be better. But as far as I can tell musl does not have the kind of comprehensive documentation where this kind of thing would go. Otherwise I would have tried that already. Also AFAICT there isn't an issue tracker. I guess I could post to the mailing list. Would that satsify this concern?

I doubt very much that musl upstream would want to provide sigdescr_np. They focus heavily on standards conformance and sigdescr_np is an API glibc simply made up (understandably so, but, still).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like musl has fixed strings for all the real-time signals. Is musl's handling of real-time and unknown signals included in the above commitment?

My conversation was about the return value from strsignal and did not distinguish the behaviour depending on the argument values. Such a distinction wouldn't make sense as an API commitment since the caller couldn't rely on the threadsafety.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A public mailing list post with an ack from the maintainer would be helpful, yes.

And I didn't want to make any assumptions about the scope of that informal commitment; I could have imagined something like "if you pass a known signal number we'll guarantee to return a static, but we might change the behavior for unknown signal numbers in the future". That's still a commitment we could rely on, as long as we were careful to only pass known signal numbers. Hence trying to confirm the scope of the commitment.

#[cfg(arget_env = "musl")]
fn lookup(&self, sig: i32) -> Option<&'static str> {
unsafe {
extern "C" fn strsignal(sig: c_int) -> *const *const c_char;
ptr_to_maybe_str(strsignal(sig))
}
}
}

pub struct Unsupported;
pub static abbrevs: Unsupported = Unsupported;
impl Lookup for Unspported {
fn lookup(&self, sig: i32) -> Option<&'static str> {
extern "C" fn strsignal(sig: c_int) -> *const *const c_char;
ptr_to_maybe_str(strsignal(sig))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I'm following this correctly: If this is unsupported, shouldn't it just return None rather than duplicating? Also, wouldn't this result in duplicated output (the second time with a SIG prepended)?

}
}
}

/// Gets a detailed string description for the given signal number.
pub fn signal_display(f: &mut dyn fmt::Write, sig: i32) -> fmt::Result {
// shame there is no strsignal_r anywhere!

if let Some(text) = signal_lookup::descrs.lookup(sig) {
f.write_str(text)?;
if let Some(name) = signal_lookup::abbrevs.lookup(sig) {
write!(f, " (SIG{})", name)?;
}
} else {
write!(f, "signal {}", sig)?;
}
Ok(())
}

pub fn getcwd() -> io::Result<PathBuf> {
let mut buf = Vec::with_capacity(512);
loop {
Expand Down
23 changes: 23 additions & 0 deletions library/std/src/sys/unix/os/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,26 @@ fn test_parse_glibc_version() {
assert_eq!(parsed, parse_glibc_version(version_str));
}
}

#[test]
fn try_all_signals() {
fn chk(l: &dyn SignalLookupMethod, sig: usize) {
let got = l.lookup(sig as i32);
println!("{:2} {:?}", sig, got);
if let Some(got) = got {
for &c in got.as_bytes() {
assert!(c == b' ' || c.is_ascii_graphic(), "sig={} got={:?}", c, &got);
}
}
}

for sig in 0..NSIG {
chk(&signal_lookup::descrs, sig);
chk(&signal_lookup::abbrevs, sig);
}

// 1..15 are anciently conventional signal numbers; check they can be looked up:
for sig in 1..15 {
assert!(signal_lookup::descrs.lookup(sig).is_some());
}
}
17 changes: 10 additions & 7 deletions library/std/src/sys/unix/process/process_unix.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use crate::io::{self, Error, ErrorKind};
use crate::ptr;
use crate::sys;
use crate::sys::cvt;
use crate::sys::os::signal_display;
use crate::sys::process::process_common::*;

#[cfg(target_os = "vxworks")]
Expand Down Expand Up @@ -526,20 +527,22 @@ impl From<c_int> for ExitStatus {
impl fmt::Display for ExitStatus {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
if let Some(code) = self.code() {
write!(f, "exit code: {}", code)
write!(f, "exit status: {}", code)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR should change this particular bit of the output. We currently refer to this as code and provide a .code() function to retrieve it, so using the same consistent terminology seems preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is controversial I will split it out into a different MR. (The "consistent terminology" from the stdlib names is simply wrong for Unix, as I have recently documented properly.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not trying to make a case for or against that terminology, just that I don't think that change should be in this PR.

Regarding the change itself, my only concern is not introducing a source of confusion by being inconsistent between documentation and existing function names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you have this concern. I agree that the conversation would be better in a different MR, so yes, I will do that. Thanks. (I'll "resolve" this review comment with that other MR when it exists.)

} else if let Some(signal) = self.signal() {
write!(f, "signal: ")?;
signal_display(f, signal)?;
if self.core_dumped() {
write!(f, "signal: {} (core dumped)", signal)
} else {
write!(f, "signal: {}", signal)
write!(f, " (core dumped)")?;
}
} else if let Some(signal) = self.stopped_signal() {
write!(f, "stopped (not terminated) by signal: {}", signal)
write!(f, "stopped (not terminated) by signal: ")?;
signal_display(f, signal)?;
} else if self.continued() {
write!(f, "continued (WIFCONTINUED)")
write!(f, "continued (WIFCONTINUED)")?;
} else {
write!(f, "unrecognised wait status: {} {:#x}", self.0, self.0)
write!(f, "unrecognised wait status: {} {:#x}", self.0, self.0)?;
}
Ok(())
}
}

Expand Down
34 changes: 25 additions & 9 deletions library/std/src/sys/unix/process/process_unix/tests.rs
Original file line number Diff line number Diff line change
@@ -1,30 +1,46 @@
#[test]
#[rustfmt::skip] // avoids tidy destroying the legibility of the hex/string tables
fn exitstatus_display_tests() {
// In practice this is the same on every Unix.
// If some weird platform turns out to be different, and this test fails, use #[cfg].
// If some weird platform turns out to be different, and this test fails, use if cfg!
use crate::os::unix::process::ExitStatusExt;
use crate::process::ExitStatus;

let t = |v, s| assert_eq!(s, format!("{}", <ExitStatus as ExitStatusExt>::from_raw(v)));
let t = |v, exp: &[&str]| {
let got = format!("{}", <ExitStatus as ExitStatusExt>::from_raw(v));
assert!(exp.contains(&got.as_str()), "got={:?} exp={:?}", &got, exp);
};

t(0x0000f, "signal: 15");
t(0x0008b, "signal: 11 (core dumped)");
t(0x00000, "exit code: 0");
t(0x0ff00, "exit code: 255");
// SuS says that wait status 0 corresponds to WIFEXITED and WEXITSTATUS==0.
// The implementation of `ExitStatusError` relies on this fact.
// So this one must always pass - don't disable this one with cfg!
t(0x00000, &["exit status: 0"]);

// We cope with a variety of conventional signal strings, both with and without the signal
// abbrevation too. It would be better to compare this with the result of strsignal but that
// is not threadsafe which is big reason we want a set of test cases...
//
// The signal formatting is done by signal_display in library/std/src/sys/unix/os.rs.
t(0x0000f, &["signal: Terminated",
"signal: Terminated (SIGTERM)"]);
t(0x0008b, &["signal: Segmentation fault (core dumped)",
"signal: Segmentation fault (SIGSEGV) (core dumped)"]);
t(0x0ff00, &["exit status: 255"]);

// On MacOS, 0x0137f is WIFCONTINUED, not WIFSTOPPED. Probably *BSD is similar.
// https://github.com/rust-lang/rust/pull/82749#issuecomment-790525956
// The purpose of this test is to test our string formatting, not our understanding of the wait
// status magic numbers. So restrict these to Linux.
if cfg!(target_os = "linux") {
t(0x0137f, "stopped (not terminated) by signal: 19");
t(0x0ffff, "continued (WIFCONTINUED)");
t(0x0137f, &["stopped (not terminated) by signal: Stopped (signal)",
"stopped (not terminated) by signal: Stopped (signal) (SIGSTOP)"]);
t(0x0ffff, &["continued (WIFCONTINUED)"]);
}

// Testing "unrecognised wait status" is hard because the wait.h macros typically
// assume that the value came from wait and isn't mad. With the glibc I have here
// this works:
if cfg!(all(target_os = "linux", target_env = "gnu")) {
t(0x000ff, "unrecognised wait status: 255 0xff");
t(0x000ff, &["unrecognised wait status: 255 0xff"]);
}
}