-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: Wait can wait on the wrong process #67642
Comments
Change https://go.dev/cl/588675 mentions this issue: |
Upon further thought, I don't think this is sufficient. We could protect the second wait from waiting on the wrong PID, but the |
There are several issues with pidfd handling today: * The zero value of a Process makes the handle field appear valid, so methods attempt to use it as a pidfd rather than falling back to the PID as they should (#67634). * If a process doesn't exist, FindProcess returns a Process with Pid == -2, which is not a compatible change (#67640). * pidfd close is racy as-is. A Release call or successful Wait will clear the handle field and close the pidfd. However, a concurrent call may have already loaded the handle field and could then proceed to use the closed FD (which could have been reopened as a different pidfd, targeting a different process) (#67641). This CL performs multiple structural changes to the internals of Process. First and foremost, each method is refactored to clearly select either pidfd or raw pid mode. Previously, raw pid mode was structured as a fallback when pidfd mode is unavailable. This works fine, but it does not make it clear that a given Process object either always uses pidfd or always uses raw pid. Since each mode needs to handle different race conditions, it helps to make it clear that we can't switch between modes within a single Process object. Second, pidfd close safety is handled by reference counting uses of the FD. The last user of the FD will close the FD. For example, this means that with concurrent Release and Signal, the Signal call may be the one to close the FD. This is the bulk of this CL, though I find the end result makes the overall implementation easier to reason about. Third, the PID path handles a similar race condtion between Wait and Kill: Wait frees the PID value in the kernel, which could be reallocated causing Kill to target the wrong process. This is handled with a done flag and a mutex. The done flag now shares the same state field used for the handle. Similarly, the Windows implementation reuses all of the handle reference counting that Linux uses. This means the implementations more consistent, and make Windows safe against the same handle reuse problems. (Though I am unsure if Windows ever reuses handles). Wait has a slight behavior change on Windows: previously Wait after Release or an earlier Wait would hang indefinitely (WaitForSingleObject on syscall.InvalidHandle waits indefinitely). Now it returns the same errors as Linux (EINVAL and ErrProcessDone, respectively). Similarly, Release on Windows no longer returns close errors, as it may not actually be the place where the close occurs. Fixes #67634. Fixes #67640. Fixes #67641. Updates #67642. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-windows-amd64-longtest Change-Id: I2ad998f7b67d32031e6f870e8533dbd55d3c3d10 Reviewed-on: https://go-review.googlesource.com/c/go/+/588675 Reviewed-by: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
In #13987, we fixed the race between Wait and Signal where Wait frees the PID in the OS, which can then reuse it before Signal sends a signal to what is now a different process.
The same race applies to Wait + Wait (ignoring pidfd): https://cs.opensource.google/go/go/+/master:src/os/exec_unix.go;l=41-54;drc=dbe2e757bb55f80de1a622da6bd5060e979208d1
In two concurrent calls to Wait, both will see the process as waitable in
blockUntilWaitable
. Both then proceed unconditionally to the wait syscall. One of these calls will win and get the wait status. The other, if slow and the OS has reused the PID, will wait on a different process with the same PID.I believe this race would be fixed by replacing the unconditional
setDone
with a compare and swap that bails out ifisdone
is already set.cc @ianlancetaylor @golang/runtime
The text was updated successfully, but these errors were encountered: