-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: crashes due to new error from FindProcess (CL 542699) #65866
Comments
This behavior was intentionally changed in https://go.dev/cl/542699, merged just yesterday, under the belief that almost no code would depend on the old behavior. That said, receiving a report of breakage mere hours after merging the CL (thank you for proactively testing tip!) makes me suspect that dependence is more widespread than expected and that we should reconsider changing this behavior. Could you share more about your original code that broke because of this change? Based on the issue title, I'm guessing the code did not check the error from |
If we do need to change behavior, I see a few options:
(1) would catch users that did not do error checking on Neither case will work if users are legitimately depending on the racy behavior. e.g., they call @stgraber would any/all of these alternatives work for your application? |
That's correct, our code was calling FindProcess and then unconditionally call .Signal on the returned value. As this can now be nil, we ended up with a panic... In our case I've simply sent a PR which now properly handles the err value of FindProcess so we'll be fine moving forward. This unfortunately will not help anyone building any of our current stable releases with Go 1.23 as anyone doing that will be getting confusing failures. |
For us, 1) would work, so long as the error we get from Signal for that dummy error process ends up being I think that'd be the best option as anyone doing proper error checking on What I'd really like to avoid here is other folks with code similar to ours out there, where the user decides to rebuild the application with the latest Go release. They'd get no compile time error at all and would have a resulting binary that appears to work fine until something triggers the code path that hits FindProcess+Signal and panics the whole thing. |
I don't see a proposal associated with https://go.dev/cl/542699, and it seems like that sort of incompatible change really ought to have one. I suggest that we revert it and send the change through the proposal process to work out the compatibility implications in more detail. |
#62654 was originally a proposal, but it was taken out of the proposal process because it wasn't considered "a significant functionality change". Perhaps we should have put it back when we decided it was big enough to add a GODEBUG option. |
I think that (1) is the way to go; I'll work on that. |
|
Change https://go.dev/cl/566179 mentions this issue: |
@stgraber thanks for reporting that! I guess the scenario is quite common and it totally slipped my mind. |
@prattmic here is one more alternative to 1. and 2. proposed above. Best expressed as a diff: func findProcess(pid int) (p *Process, err error) {
h, err := pidfdFind(pid)
if err == ErrProcessDone {
- return nil, err
+ return newProcess(pid, unsetHandle), err
}
// Ignore all other errors from pidfdFind, This way, for users who do not check for errors from OTOH this is worse than your proposal 1 (implemented in https://go.dev/cl/566179) because it won't hint the users to start checking for errors. |
Given that the best path forward for this issue isn't obvious and we have test failures as well (#65857), I think we should revert these CLs until we decide the correct approach (and fix the tests). I will send reverts. |
Change https://go.dev/cl/566476 mentions this issue: |
I've run the pidfd CLs through the Google internal Go tests. I haven't fully analyzed all failures yet, but thus far I've seen failures in three categories:
|
In a fun case of (2), one internal package actually constructs a new error containing the text "this shouldn't be possible on unix systems" when |
Aha, one more failure has manual construction of
I suspect this may be the cause of all of the "bad file descriptor" failures, as |
I finished my analysis and didn't find more failures than those listed above. From what I've seen, I believe option 2 in #65866 (comment) would be OK for all of the failing tests I saw. i.e., |
@prattmic thank you for your analysis. I think there are two separate issues here. I. Closing fd 0As for using fd 0 as pidfd value (causing test failures in #65857), this was caused by a stupid bug of mine (see here. This is now fixed in CL 570036. Unfortunately, LUCI gotip-android apparently can't be tested. Now, the only scenario of closing fd 0 I can think of is this: p := os.Process{Pid: pid}
p.Release() Alas, I can't think of a good way to protect from it (without adding something like In all other cases, II. Backward-compatible os.FindProcess interfaceIt boils down to the question of what to do in FindProcess when pidfdFind returned ErrProcessDone. I think we have three choices here: (1) Do something similar to what CL 542699 plus CL 566179 did, meaning:
This way, users can switch to the new behavior of FindProcess relatively painlessly. (2) Do not ever return errors from FindProcess:
In essense, these two are identical to your (1) and (2) from the earlier comment. Both approaches have their pros and cons. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as outdated.
This comment was marked as outdated.
Change https://go.dev/cl/570681 mentions this issue: |
This is a continuation of CL 570036. Amend FindProcess to use pidfdFind, and make it return a special Process with Pid of pidDone (-2) if the process is not found. Amend Wait and Signal to return ErrProcessDone if pid == pidDone. The alternative to the above would be to make FindProcess return ErrProcessDone, but this is unexpected and incompatible API change, as discussed in #65866 and #51246. For #62654. Rework of CL 542699 (which got reverted in CL 566476). Change-Id: Ifb4cd3ad1433152fd72ee685d0b85d20377f8723 Reviewed-on: https://go-review.googlesource.com/c/go/+/570681 TryBot-Bypass: Dmitri Shuralyov <dmitshur@golang.org> Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Go version
go version devel go1.23-5d4e8f5 Thu Feb 22 01:57:32 2024 +0000 linux/amd64
Output of
go env
in your module/workspace:What did you do?
Following https://pkg.go.dev/os#FindProcess, I have logic which assumes that on Linux systems, os.FindProcess will always succeed and that the resulting record can then be used with Signal to validate its actual existence.
However it appears the recent switch to using PIDFD within Go may have regressed this?
@kolyshkin
Example code:
What did you see happen?
What did you expect to see?
The text was updated successfully, but these errors were encountered: