Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve display of Unix wait statuses, notably giving names for signals #82974

Closed
wants to merge 11 commits into from

Conversation

ijackson
Copy link
Contributor

@ijackson ijackson commented Mar 10, 2021

The motivations here are:

  • Provide signal names (eg Segmentation fault (SIGSEGV)) rather than numbers (eg signal 11)
  • Correct the string for WIFEXITED to describe the value as the exit status

These two changes are behavioural changes to the existing APIs so insta-stable. But I hope they won't be controversial.

Providing signal names requires adding a facility (exposed as unstable in this MR) to do the number to string conversion. The implementation of that (and indeed the exposed API) is not entirely straightforward because of the lack of strsignal_r and non-portability of sys_siglist and sys_signame.

The test cases for ExitStatus as Display need to be updated to contain the new strings. Signal strings are very widely shared across Unices so I am hoping that this will work everywhere. However, it is quite possible that this test case will need adjustment on some platform. Additionally, I have not been able to test this myself on anything with a very recent glibc (2.32 or later), for which this code needs special handling. So it might be worth running this through @bors try to spot any issues.

Proper Unix terminology is "exit status" (vs "wait status").
"exit code" is imprecise and therefore unclear.

This is a user-visible change to the behaviour of all Rust programs
which run Unix subprocesses.  Hopefully no-one is matching against the
exit code string, except perhaps in tests.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
We are going to make this more complicated, so it makes sense not to
expect to fall through the return value from a single call in each
branch.

No functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
We are going to implement signal description which is not entirely
predictable.

No functional change.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
I'm going to want signal_describe in a moment for better reporting of
signals in ExitStatus.  API design principles dictate that this inner
functionality should be exposed in its own right, rather than being
available only via ExitStatus's Display impl.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
Print the signal description, and name, if we can.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
Split this one out.  It's special.

No functional change if the tests pass.

Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 10, 2021
@ijackson
Copy link
Contributor Author

This is a follow-on from #82411. See also #82973 which I have just submitted.

/// }
/// ```
#[unstable(feature = "unix_signal_strings", issue = "none")]
pub fn signal_abbrev(sig: i32) -> Option<&'static str> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit: avoid abbreviating words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose this name because I was copying the glibc name for the same thing (sigabbrev). I don't think calling it "abbrevation" is right. BSD calls it the signame so name would be a possibility but it seems rather generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a rule against abbreviating things? Ref ptr alloc iter mut vec env args .... SocketAddr Dir .... etc. etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not a rule per se, but my understanding is that the preference tends to lean towards names that are not abbreviated as far as the recent additions to the standard library API surface go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that the preference tends to lean towards names that are not abbreviated as far as the recent additions to the standard library API surface go.

Fair enough. I think, though, that in this case the abbreviation is definitely justified both by the prior art in glibc (which feels very natural to me as an old Unix hand) and by the clumsiness of the unabbreviated word "abbreviation" :-).

If this is a blocker for this MR then I can of course respray this corner of the bikeshed. In any case, thanks for your attention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By way of hopefully sidestepping the bikeshed about abbreviations, I think it'd be fair to call this the signal_name. I don't think we need to call it an "abbreviation" at all. (Yes, TERM is itself an abbreviation, but it's the actual name of the signal.) And it makes sense to say we're mapping a number to a "name" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I think it'd be fair to call this the signal_name. I don't think we need to call it an "abbreviation" at all. (Yes, TERM is itself an abbreviation, but it's the actual name of the signal.) And it makes sense to say we're mapping a number to a "name" here.

name is fine by me, so I will do that. (And yes the BSDs even have this in the sys_signame array so that is precedent.)

By way of hopefully sidestepping the bikeshed about abbreviations, ...

:-).

@nagisa
Copy link
Member

nagisa commented Mar 10, 2021

This implementation makes me wonder if it wouldn't be easier and simpler to match the signal numbers as defined in libc and map them to our own descriptions, as such:

match signum {
    libc::SIGSEGV => "Segmentation fault (SIGSEGV)",
    libc::SIGKILL => "Killed (SIGKILL)",
    ...
}

Is there any reason this wouldn't be an option?

@ijackson
Copy link
Contributor Author

ijackson commented Mar 10, 2021 via email

library/std/src/sys/unix/os.rs Outdated Show resolved Hide resolved
library/std/src/sys/unix/os.rs Outdated Show resolved Hide resolved
ijackson and others added 2 commits March 10, 2021 16:43
Co-authored-by: Tavian Barnes <tavianator@tavianator.com>
Co-authored-by: Tavian Barnes <tavianator@tavianator.com>
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
Copy link
Member

@joshtriplett joshtriplett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline. Overall, the concept of this seems reasonable, and having clearer information displayed about ExitStatus should make diagnosing things easier.

let got = if let Some(sigfoo_np) = self.sigfoo_np.get() {
sigfoo_np(sig)
} else if let Some(use_sys_foolist) =
// if let chaining would be nice here!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice in many places, but we don't normally leave these kinds of comments in the source. Once we have if let chaining, we're very likely to update many bits of code over time.

impl Lookup for ViaStrSignal {
// musl's strsignal is threadsafe: it just looks up in a static array (that we don't have
// access to in any other way). I have spoken to Rich Felker (musl upstream author) on IRC
// and confirmed that musl's strsignal is very unlikely to ever become non-thread-safe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to rely on, but it would be helpful to have this documented somewhere other than an ephemeral second-hand conversation, if possible. If the author of musl feels comfortable stating this, perhaps it could go in the musl documentation somewhere?

(Alternatively, perhaps musl could provide sigdescr_np as a symbol alias for strsignal? That'd be a longer wait, though. Documentation we could link to would suffice.)

It looks like glibc doesn't do this because it generates strings for unknown and real-time signals on the fly, in addition to being locale-dependent.

It looks like musl has fixed strings for all the real-time signals. Is musl's handling of real-time and unknown signals included in the above commitment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that having this commitment documented would be better. But as far as I can tell musl does not have the kind of comprehensive documentation where this kind of thing would go. Otherwise I would have tried that already. Also AFAICT there isn't an issue tracker. I guess I could post to the mailing list. Would that satsify this concern?

I doubt very much that musl upstream would want to provide sigdescr_np. They focus heavily on standards conformance and sigdescr_np is an API glibc simply made up (understandably so, but, still).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like musl has fixed strings for all the real-time signals. Is musl's handling of real-time and unknown signals included in the above commitment?

My conversation was about the return value from strsignal and did not distinguish the behaviour depending on the argument values. Such a distinction wouldn't make sense as an API commitment since the caller couldn't rely on the threadsafety.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A public mailing list post with an ack from the maintainer would be helpful, yes.

And I didn't want to make any assumptions about the scope of that informal commitment; I could have imagined something like "if you pass a known signal number we'll guarantee to return a static, but we might change the behavior for unknown signal numbers in the future". That's still a commitment we could rely on, as long as we were careful to only pass known signal numbers. Hence trying to confirm the scope of the commitment.

fn lookup(&self, sig: i32) -> Option<&'static str>;
}

unsafe fn ptr_to_maybe_str(p: *const c_char) -> Option<&'static str> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, functions returning Option use opt or option in their name, not maybe. (The only instances of to_maybe in the Rust codebase related to a compiler-internal type named MaybeOwned.)

library/std/src/sys/unix/os.rs Outdated Show resolved Hide resolved
library/std/src/sys/unix/os.rs Outdated Show resolved Hide resolved
/// Currently there are two deviations from the return value of `strsignal`:
///
/// * `signal_describe` does not translate the string according to the locale; C `strsignal`
/// usually does. Unfortunately a translated version is not currently available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of Rust's standard library functions currently include any kind of translation; this function is not unusual in that regard. (The first sentence, stating that it isn't translated, is helpful; the "Unfortunately" could be attached to many other functions, and is a known limitation of std.)

library/std/src/sys/unix/ext/process.rs Show resolved Hide resolved
impl Lookup for Unspported {
fn lookup(&self, sig: i32) -> Option<&'static str> {
extern "C" fn strsignal(sig: c_int) -> *const *const c_char;
ptr_to_maybe_str(strsignal(sig))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I'm following this correctly: If this is unsupported, shouldn't it just return None rather than duplicating? Also, wouldn't this result in duplicated output (the second time with a SIG prepended)?

@@ -526,20 +527,22 @@ impl From<c_int> for ExitStatus {
impl fmt::Display for ExitStatus {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
if let Some(code) = self.code() {
write!(f, "exit code: {}", code)
write!(f, "exit status: {}", code)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR should change this particular bit of the output. We currently refer to this as code and provide a .code() function to retrieve it, so using the same consistent terminology seems preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is controversial I will split it out into a different MR. (The "consistent terminology" from the stdlib names is simply wrong for Unix, as I have recently documented properly.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not trying to make a case for or against that terminology, just that I don't think that change should be in this PR.

Regarding the change itself, my only concern is not introducing a source of confusion by being inconsistent between documentation and existing function names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you have this concern. I agree that the conversation would be better in a different MR, so yes, I will do that. Thanks. (I'll "resolve" this review comment with that other MR when it exists.)

@joshtriplett
Copy link
Member

joshtriplett commented Mar 16, 2021

@ijackson wrote:

While the lower signal numbers are conventional, the higher ones are not.

The numbers aren't all conventional, but libc already handles the number-to-name mapping for every target, so libc::SIGFOO will have the right value, which would simplify this task substantially. And it would be much simpler, and involve less unsafe code (as well as less probing for specific libc symbols).

The only thing we'd have to ensure is that we don't have an entry for any SIGFOO that doesn't exist, which would cause a compile failure. (And, a little less critically, that we have an entry for each SIGFOO that does exist, lest we have a missing description; however, falling back to a numeric description isn't the end of the world.)

It's also not the end of the world if this starts out giving better output on glibc and musl (which is already the case) as long as it's functional everywhere and can be extended easily on additional platforms.

Given all of that, I think it's worth at least considering the simpler solution. While it'd be a little bit of maintenance work when someone comes up with a new UNIX target (which doesn't happen especially often), I think that might be less painful than using _np symbols and dealing with potential future compatibility issues that may arise on existing targets.

We would have to maintain a table for every Unix we port to.

We could maintain one table, with some cfg entries, which will almost never need updating.

ijackson and others added 2 commits March 16, 2021 14:27
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
@ijackson
Copy link
Contributor Author

Hi. Thanks for the review. The other comments are mostly bug reports in my series and I'll fix them.

Given all of that, I think it's worth at least considering the simpler solution.

This is in relation to the suggestion of putting a signal table into Rust stdlib. Let me consider this in some more detail then:

I think the suggestion is to avoid the use of Weak etc. completely and have a table mapping libc::SIGFOO identifiers names to the coresponding "FOO" abbrev and description string.

One result would be that Rust would print different signal descriptions to the platform's standard C library. Since signal descriptions are already rather gnomic, and the precise descriptions are known to the administrators of the system in question, I think this is extremely undesirable.

Also, on some platforms, depending on the completeness of Rust's table, Rust would be able to format only a subset of signal numbers. It would also be possible to introduce bugs in Rust's stdlib affecting individual signal numbers. Or, to put it another way, it would be really difficult to get good test coverage for the resulting implementation.

While it'd be a little bit of maintenance work when someone comes up with a new UNIX target

I also have an extremely strong philosophical objection to hardcoding a list of signals and their descriptions in the Rust stdlib.

The usual approach to making programs portable to a variety of Unix variants, where no sufficient API is universally available, is to identify API variants that are commonly available and use some kind of platform configuration to select the right implementation.

Rust stdlib doesn't seem to do conditional compilation for this; rather it takes the the Weak symbol approach. So I did that.

While it'd be a little bit of maintenance work when someone comes up with a new UNIX target (which doesn't happen especially often), I think that might be less painful than using _np symbols and dealing with potential future compatibility issues that may arise on existing targets.

I'm not sure what you are referring to. Do you think there is something wrong with _np functions ?

sigdescr_np and sigabbrev_np are fully supported and documented functions in glibc (albeit only in the latest versions). I think the _np must simply mean "not posix" which is of course true, but we are necessarily using non-POSIX APIs here because there is no POSIX API that works properly. These functions are the official documented and supported replacement, in glibc, for the traditional sys_siglist (and BSD's sys_signame).

So there shouldn't be any future compatibility issues with them.

@joshtriplett
Copy link
Member

I'm not sure what you are referring to. Do you think there is something wrong with _np functions ?

glibc sometimes uses its symbol versioning to deprecate, change, or tweak such functions, maintaining compatibility with old programs while preventing new ones from using it. (glibc doesn't always guarantee 100% source compatibility, just 100% binary compatibility.) We've had issues with that kind of symbol versioning in the past, because we try to maintain compatibility with a broad spectrum of glibc versions and not just the version we build with. Hence the added complexity of Weak, and the need to check different symbols for different versions of glibc. As an example, if (hypothetically) a version of these functions were ever standardized, I could imagine glibc deciding to deprecate the _np version and only making it available to code that built against an older glibc. That incurs maintenance effort here.

I'm not suggesting we can't do that, or that using such functions is always a problem, just that they can be slightly higher friction to use and maintain, and that we've had related issues in the past. That seemed like a good reason to entertain the notion of maintaining a table ourselves, and compare the two approaches.

Whichever path we go, we'll end up updating things for new targets in order to get better descriptions; there's no portable function we can call that gives us the signal descriptions, so if a new platform arises that provides such descriptions, we'd need to update this code anyway.

One result would be that Rust would print different signal descriptions to the platform's standard C library. Since signal descriptions are already rather gnomic, and the precise descriptions are known to the administrators of the system in question, I think this is extremely undesirable.

I think if we were to do this that we'd want to use the standard descriptions; we certainly wouldn't want to gratuitously reword them. I wouldn't expect those descriptions to vary wildly across platforms, at least not in a way likely to cause confusion for an admin. (We might, for instance, end up using the glibc descriptions on musl targets, but I don't think anyone would be surprised to see the glibc descriptions where they'd otherwise get just a signal name.) And as you observed, it'd be rather surprising to see something other than "Terminated" as the description for SIGTERM, for instance. If there's a key difference, we can always use cfg to match the different descriptions precisely.

The usual approach to making programs portable to a variety of Unix variants, where no sufficient API is universally available, is to identify API variants that are commonly available and use some kind of platform configuration to select the right implementation.

We're not just building a program; we're building a standard library that will, itself, serve as a portability layer across targets. That changes the balance somewhat. (And we don't have "a variety of Unix variants", we have a fairly small number that seems unlikely to see rapid increases; as long as it's clear what to change when adding a new target that doesn't seem likely to be a problem.) It's reasonable to evaluate what approach to portability will be easiest for us to maintain in the future (what will see the least changes, and the least complex changes when it does need changing), as well as what will give people the most user-friendly behavior. Sometimes we call functions from the underlying libraries; sometimes we implement things ourselves if using an underlying library function is more trouble than it's worth and the implementation is straightforward, or if the underlying library functions don't give the behavior we want.

Also, in this case, it doesn't seem like there's any useful function we can call on some targets. On musl, for instance, this PR doesn't provide signal descriptions, because there's no library function we can call; we'll only ever get signal names. Likewise for uclibc. An advantage of providing the descriptions ourselves is that they'll always be available. It seems useful to make sure those descriptions are available even if the underlying C library doesn't have them. And if we want to make those descriptions available everywhere, we might as well use those descriptions even if the C library does have them, since that means calling less unsafe code and having less code to maintain. If we're calling the underlying C library functions, it should be because we don't want to maintain such a table at all, and we're willing to just not have signal descriptions on platforms without those functions available.

Whichever path we go, we'll end up updating code for new targets in order to get better descriptions; there's no fully portable function we can call that gives us the signal descriptions, so if a new platform arises that provides such descriptions, we'd need to update this code anyway, either to call the same functions on more targets or to add more targets. I also think it's exceedingly likely that new UNIX implementations will use very similar signals and descriptions, which means I think on balance we'll have relatively little work to do to add a new UNIX target. (Also note that we have to update the libc crate for new targets anyway, including any differences they have in SIGFOO constants. We could add a note to the libc crate to remind people to update these descriptions when adding a target, or even just maintain a compile-time constant in the libc crate to keep it next to the current manual lists of signals for targets. That would handle our test coverage, reducing it to the problem we already have of maintaining the libc crate for every target.)

Summary

If we call the libc functions:

[+] We don't have to maintain a table ourselves, with cfg variations if a SIGFOO exists on some targets but not others.
[+] We automatically get all signal numbers supported by the target, and we'll never miss one.
[+] We don't have to do anything to precisely match the descriptions from libc (which people shouldn't depend on programmatically, but there's value in matching them nonetheless).
[+] Binaries contain only one copy of these strings, even if they format signals from both C and Rust and the linker doesn't de-duplicate the strings.
[-] We don't get signal descriptions if the underlying libc doesn't provide a (non-portable) function to provide them.
[-] We don't even get signal names if the underlying libc doesn't provide a thread-safe way to obtain them.
[-] We have to use unsafe code to call a non-standard libc function via dlsym, and potentially depend on some assumptions about the libc implementation for thread-safety.
[-] We need to update this code to call an appropriate function on each new UNIX target.

If we maintain a table of names and descriptions ourselves:

[+] We get consistent results (both signal names and signal descriptions) even if the underlying libc doesn't provide such functions or provide useful descriptions.
[+] We don't need any unsafe code, or any libc-specific code. The overall change is likely to be substantially smaller.
[+] We know the implementation, so we can rely on that (e.g. returning a &'static str and not having to parse and validate a C string). This is not especially important, though.
[+] Marginally better performance (we can do a match rather than a linear search). This is not especially important; it's unlikely that formatting signal names will be on any critical path.
[-] We have to maintain a table ourselves, and update it if a target is missing a signal or has a new signal.
[-] If a target uses a non-standard description for a standard signal, we won't match that by default. (We could choose to add a cfg to match it precisely.)

I don't think that's a clear-cut win for either approach; I think this is a set of tradeoffs that libs will have to evaluate.

@ijackson
Copy link
Contributor Author

Summarising the situation is useful but I don't entirely agree with your characterisation. So here is my summary:

Calling the libc APIs

  • We automatically get all signal numbers supported by the target, and we'll never miss one.

  • Porting to a new kernel supported by our existing libcs (glibc and/or musl) is zero work.

  • Porting to a new Unix libc is the usual task of identifying appropriate APIs and implementing code to call them.

  • We make some platform-specific assumptions about the behaviour and future availability of the APIs we are using.

  • We can write tests that will check that things are working as expected. If those tests pass the results will be correct for all signal numbers.

Making our own table

  • We violate the expected social contract of how to write programs on Unix.

  • We must write and maintain a set of non-portable list of signals. We need one such table for each kernel that Rust is ported to.

  • We may sometimes produce inconsistent output, compared to that from programs written in other languages. This is a problem because "special snowflake" signal descriptions coming out of Rust programs will not be recognisable by system adminstrators, ad hoc log parsing programs, etc.

  • We always produces abbrevations aka names (TERM) as well as descriptions (Terminated).

  • We have no way to write a test case that the table is correct. Errors in individual table entries on individual platforms would go unnoticed unless someone noticed a Rust program producing lies when reporting signals.

Non-issues

  • unsafe. This is the stdlib. The point is to be correct, even if that means using unsafe.
  • "update this code ... on each new UNIX target". We have to do something target-specific in any case. The question is whether to do something general and correct, or hardcode some special knowledge.
  • The performance and memory concerns are extremely marginal.

@ijackson
Copy link
Contributor Author

It occurred to me to ask this: evidently, since people are suggesting it for SIG* constants, it is possible to do compile-time conditionals for things in libc.

Personally I think the Weak thing is rather over-complex (and seems rather to go through the back door rather than the front) and I would prefer to do a straightforward conditional compilation, but this would involve testing the glibc version and/or function availability at build time. Is that something that can be done, eg in libc? If someone could point me in the right direction I would definitely prefer to go down that route.

@joshtriplett
Copy link
Member

joshtriplett commented Mar 17, 2021

The primary constraint motivating the use of Weak (which uses dlsym) is that the glibc version available at Rust's build time should not affect the set of glibc versions supported by Rust programs at runtime. This is a problem that arises often in C (if you want to run on a broader set of systems, you have to build on the oldest libc you want to support), and we want to reduce the degree to which the same problem affects Rust. Before we used Weak, programs such as dh_shlibdeps that tried to detect what version of glibc to add a dependency on would notice the use of symbols (even weak symbols) from current glibc and assume we needed current glibc (potentially even an exact-version constraint if it was a private symbol).

(Rust does not completely solve this problem, though it would be nice if it did. But Rust tries to make it better where possible.)

See #23628 for one bit of that history.

@joshtriplett
Copy link
Member

Two things:

First, I want to be clear that I'm not looking to advocate one or the other approach here; I just want to suggest considering both, and the tradeoffs between them. Whichever way the balance of those tradeoffs go, I'm happy to go with. I don't think there's value in arguing this further; there's plenty of information here to base a decision on.

Second, I want to follow up on a specific point from my previous post, and suggest a third option, which I previously hadn't considered; I don't want to make this a false dichotomy between just two options, and I think I incorrectly did so.

I mentioned that an advantage of providing the descriptions ourselves is that they'll always be available. It seems useful to make sure those descriptions are available even if the underlying C library doesn't have them. But we don't actually have to do better-than-the-C-library everywhere, and in particular, there's no requirement to do so on every platform, since this primarily exists to enhance error reporting. We can make individual tradeoffs between maintenance vs improved functionality.

So, as a third option, which I'm finding myself favoring: we could call into the platform C library, using the code from this PR, and on specific popular targets that don't provide this information, we could additionally provide a list ourselves. (Concretely, we could invite folks who want better error reporting on musl or uclibc to submit a patch adding such a list.) This has the advantage of always giving information at least as good as the platform C library can provide; it additionally means programs using a less common C library will fit in better with platform conventions (e.g. descriptions), getting the same clear error output whether they use glibc or musl or uclibc. And it means that providing a list on a target just makes the baseline better, and we need not maintain such a list for every target.

Personally, I do see the advantages of calling into the platform C library for this. I'm also fine if there's better error reporting on glibc than on musl/uclibc (insofar as not providing descriptions because the C library doesn't). But I'd also be happy to see a follow-up PR from musl/uclibc folks extending this to add a simple match statement for Linux signals, in place of the currently missing support for descriptions on musl/uclibc, while leaving the libc calls in place for everything else.

Given that, the approach I'd currently favor would be to accept this PR (with the planned changes), and invite follow-up additions for musl/uclibc and any other target that doesn't have a way of providing both names and descriptions.

Calling the libc APIs

* We automatically get all signal numbers supported by the target, and we'll never miss one.

Agreed. Though, as noted earlier, there might be value in the libc crate exporting a const array of signal numbers, which could be used for a variety of purposes including additional test coverage. We already have to do porting work for the libc crate to add all the signal numbers supported by a target (which depends on the kernel for the set of signals and the architecture for the signal numbers). (If, one day, we have a more automatic mechanism to generate libc bindings, we could still generate and provide such a constant easily enough.)

* Porting to a new kernel supported by our existing libcs (glibc and/or musl) is zero work.

Porting Rust to a new kernel is a great deal of work, but I agree that this PR wouldn't add any. :)

* We make some platform-specific assumptions about the behaviour and future availability of the APIs we are using.

FWIW, I don't expect this to be likely, just possible. It's come up a few times, but more in areas of the C library that have higher flux; I'd expect less churn for this kind of function.

* We can write tests that will check that things are working as expected.  If those tests pass the results will be correct for all signal numbers.

I believe we can and should do this in either case, though as noted above I think we should go with a third option.

In particular, I think there's value in testing the full range of signal numbers (if we can get that programmatically from libc), whichever approach we go with.

Making our own table

* We violate the expected social contract of how to write programs on Unix.

This is not a concrete, actionable item, nor does it describe the nature of any specific problem this approach would cause. I'm assuming that the remainder of your summary covers the concrete concerns (all of which are valid).

To be clear, I'm not currently trying to argue that one of these approaches is better than the other. I'm suggesting that the tradeoffs between these alternatives are worth serious consideration.

* We must write and maintain a set of non-portable list of signals.  We need one such table for each _kernel_ that Rust is ported to.

Each UNIX-compatible kernel, yes. That doesn't seem likely to be an especially long or rapidly-growing list, and we already have many other things in std and the libc crate that require porting to each new kernel we support. (The libc crate already needs to maintain this list for each kernel on each architecture, it just doesn't have to add descriptions.)

This does incur future maintenance effort, which I'm acknowledging, and which I don't want to either understate or overstate.

* We may sometimes produce inconsistent output, compared to that from programs written in other languages.  This is a problem because "special snowflake" signal descriptions coming out of Rust programs will not be recognisable by system adminstrators, ad hoc log parsing programs, etc.

This will only happen if the platform provides such descriptions and we don't match those descriptions, which we can fix easily enough. Also, this is apparently not a problem for programs using musl as their libc, which doesn't provide those descriptions. (I'm talking about cases where a program uses musl on a platform where most programs use glibc.)

* We always produces abbrevations aka names (`TERM`) as well as descriptions (`Terminated`).

👍

* We have no way to write a test case that the table is correct.

We can write a comprehensive test case easily enough, if the libc crate provides a list of signals.

Non-issues

* `unsafe`.  This is the stdlib.  The point is to be correct, even if that means using unsafe.

I didn't suggest that we can't use unsafe. I'm suggesting that we do take "amount of unsafe code we have to maintain" into account as a tradeoff. There are multiple potential correct solutions here, with tradeoffs in maintenance, behavior, and other factors. One of many factors we might consider is "how much unsafe code does this need".

* "update this code ... on each new UNIX target".  We have to do something target-specific in any case.

Agreed, that was my intended point.

* The performance and memory concerns are extremely marginal.

Entirely acknowledged, and I tagged each such point as such. I mentioned them for completeness only.

@joshtriplett joshtriplett added the I-needs-decision Issue: In need of a decision. label Mar 18, 2021
@m-ou-se m-ou-se assigned joshtriplett and unassigned m-ou-se Mar 21, 2021
@joshtriplett joshtriplett added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 24, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Mar 24, 2021

For library additions/changes, we usually don't require an RFC, since the discussion is easily had on a PR directly. However, the discussion on this PR so far is only about the implementation details. I'd first like to focus on the question whether we want this functionality in the standard library, before discussing how it's implemented.

This PR adds three (unstable) functions to the standard library:

pub fn signal_describe(sig: i32) -> String;
pub fn signal_string_raw(sig: i32) -> Option<&'static str>;
pub fn signal_abbrev(sig: i32) -> Option<&'static str>;

And makes changes to one implementation, which would directly be stable:

impl fmt::Display for ExitStatus;

Even though they're useful in some cases, I don't think the proposed signal_ functions should be part of the Rust standard library. Platform specific functionality like this is usually provided by crates like libc and nix and others, not by std. std does expose plenty of platform-specific APIs, but those are mostly the glue between standard library types and platform specific features. For example, MetadataExt::mode() to obtain the unix mode bits of a file. If we didn't provide those functions, users wouldn't be able to use our file system types at all if they also needed some platform-specific things with them. However, these are usually just the minimal glue necessary. For example, mode() just returns an i32 and we don't provide any of the constants like S_IRWXU.

We also don't expose strerror_*, other than through the Display and Debug implementations of std::io::Error.

The improved Display implementation for ExitStatus however does seem like an appropriate addition to std to me. Just like we do for std::io::Error, if we can display a more useful name/description, we should try to do so. But it will not be a stable guarantee, and anything that wants to have control over how they are displayed (e.g. a shell) should probably implement their own code (or use another crate) for that.

With that in mind, now back to the implementation in this PR: I simply don't think it's worth it to add this much complexity just for nicer formatting of ExitStatus. If we are providing those signal_* functions as "the proper Rust way to get a name/description for a signal" then yes, we should make sure it's complete on every platform and identical to what libc provides. But if we're just providing nice ExitStatus formatting, it's probably not worth doing so if that makes the code this complex, and a (potentially incomplete) match statement using libc's constants could be a better fit. Showing e.g. SIGSEGV or SIGHUP with their names is great, but more obscure signals (e.g. "real-time signal 9") not displaying their name is no big deal, and are probably not something we need to cover in the standard library.

@Amanieu
Copy link
Member

Amanieu commented Mar 25, 2021

I agree with @m-ou-se that the signal_* function are outside the scope of the standard library.

Regarding the implementation:

While it would be nice to get signal names from libc, the standard API for this (strsignal) is unusable due to thread safety issues. Dealing with a mess of platform-specific non-standard APIs is complicated enough that it is simpler to just roll our own signal name table (like this one from the nix crate). It's also incomplete since some targets (e.g. uclibc) only provide a thread-unsafe strsignal function with no alternatives.

I had a look at other languages (Go, Swift) and they all roll their own tables rather than dealing with the mess of libc APIs.

@ijackson
Copy link
Contributor Author

Thanks for your reviews.

roll our own signal name table

I personally don't feel like implementing that. So I will close this MR.

@ijackson ijackson closed this Mar 25, 2021
@joshtriplett joshtriplett removed the I-needs-decision Issue: In need of a decision. label Mar 26, 2021
@ijackson ijackson deleted the exitstatus-display branch August 24, 2021 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants