Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate initial connection race condition in win32PipeListener (addresses #83) #84

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

djsweet
Copy link

@djsweet djsweet commented Jul 14, 2018

Previously, win32PipeListener would create an initial pipe instance,
open a client connection on it, then immediately close that connection.
The reasoning behind this was unclear, but it appears to have been a
mechanism for claiming the pipe name in the pipe namespace. If the
listener-side single client could not be created, the listener would
abort. This created a race condition where a legitimate client could
attach itself to the named pipe before the listener could create its own
client, causing the listener to incorrectly abort due to a legitimate
client connection.

This commit alters the process for listening for and accepting new
connections: when the win32PipeListener is constructed, the first
pipe is still created as before, but it is now also used as the first
client connection as well. In order to maintain the namespace stake,
the listenerRoutine() creates another named pipe instance before
returning the connected instance to a pending Accept(), so that there is
at least one pending named pipe instance at all times.

There are some caveats to this approach: transient errors from
CreateNamedPipe will not be returned until the next Accept() call, so
there may be instances where calling clients may correct the cause
of the error, but still receive the same error across two Accept()
calls. Additionally, if transient errors from CreateNamedPipe begin
occurring and all other named pipe instances are closed, the listener
loses its namespace stake and may accidentally clobber other processes
if the transient errors resolve themselves.

It may be worth investingating closing the listener "from the inside"
if transient errors begin to occur in CreateNamedPipe, so that the
calling client is forced to create a new Listener and detect whether
its stake in the pipe namespace has been lost.

EDIT: As of commit 1c4c364, the namespace stake is guaranteed by having instances of win32Pipe "donate" their open instances back to the listener if the listener could not create a new pipe instance. This ensures that at least one open pipe instance is bound to the expected name through the lifetime of the listener, which also ensures that the stake in the namespace is not lost. It is currently using sync.Mutex rather heavily, but it might be possible to replace that with atomic.CompareAndSwapPointer instead.

Commit 1c4c364 also contains a fix for #85. This might be worth breaking out into a new PR.

Previously, win32PipeListener would create an initial pipe instance,
open a client connection on it, then immediately close that connection.
The reasoning behind this was unclear, but it appears to have been a
mechanism for claiming the pipe name in the pipe namespace. If the
listener-side single client could not be created, the listener would
abort. This created a race condition where a legitimate client could
attach itself to the named pipe before the listener could create its own
client, causing the listener to incorrectly abort due to a legitimate
client connection.

This commit alters the process for listening for and accepting new
connections: when the win32PipeListener is constructed, the first
pipe is still created as before, but it is now also used as the first
client connection as well. In order to maintain the namespace stake,
the listenerRoutine() creates another named pipe instance before
returning the connected instance to a pending Accept(), so that there is
at least one pending named pipe instance at all times.

There are some caveats to this approach: transient errors from
CreateNamedPipe will not be returned until the _next_ Accept() call, so
there may be instances where calling clients may correct the cause
of the error, but still receive the same error across two Accept()
calls. Additionally, if transient errors from CreateNamedPipe begin
occurring and all other named pipe instances are closed, the listener
loses its namespace stake and _may_ accidentally clobber other processes
if the transient errors resolve themselves.

It may be worth investingating closing the listener "from the inside"
if transient errors begin to occur in CreateNamedPipe, so that the
calling client is forced to create a new Listener and detect whether
its stake in the pipe namespace has been lost.
@djsweet
Copy link
Author

djsweet commented Jul 14, 2018

This is a work in progress meant to address issue #83. I'm still not terribly happy with the treatment of errors in additional calls to CreateNamedPipe, but I'm not sure what can be done other than forcibly closing the listener.

This commit enforces the invariant that a win32PipeListener guarantees
its stake in the pipe namespace by having instances of win32Pipe
"donate" their disconnected pipe instances to the listener if the
listener failed to create a new instance.

Currently, this is done with heavy use of sync.Mutex. The "best case"
of the listener successfully creating a successor instance elides one
locking call with the use of atomic.LoadPointer; it may be possible to
use atomic.CompareAndSwapPointer to achieve the same effect in general.
This is worth pursuing further.

This commit also fixes a race condition in win32PipeListener.Close,
where the close channel send may have been swallowed by
connectServerPipe if a client successfully connected before its pipe
instance was shut down.
@gdamore
Copy link
Contributor

gdamore commented Jul 18, 2018

I made some long comments about this PR in #80 -- submitter and others please read what I wrote there. I don't think these changes address #67 completely, although they do address one element of the problem, and I think still aren't there 100%.

For the record, I'd also be willing to look at refactoring #80 to incorporate the approach here (borrowing this code, or working with @djsweet to do so) as part of the listener side fixes (though we need to make some changes I think), and then lifting the client side of my work to provide the complete solution. This is quite a bit more work than just band aiding the one problem my code has (potential race), but would probably be architecturally cleaner, without wasting two extra handles per listener instance. I'm not interested in doing that work unless it seems pretty clear that this would be the direction we all agree we should take, and that it would lead to ultimately landing in master.

@djsweet
Copy link
Author

djsweet commented Jul 18, 2018

I have a separate branch in which I've been trying out atomic.CompareAndSwapPointer instead of mutexes, and am currently using it for my own purposes. However, I'm not terribly eager to integrate that in, because the Named Pipe API is quite inconsistent when it comes to the myriad ways of closing a connection.

That branch had to treat ERROR_PIPE_NOT_CONNECTED as io.EOF for compatibility with existing consumers and consistency with the net.Conn interface. However, this was not an enhancement but a requirement: client handles to named pipes closed with DisconnectNamedPipe return ERROR_PIPE_NOT_CONNECTED instead of the expected ERROR_HANDLE_EOF. This has negative implications for clients of servers consuming this library. This current PR also might use DisconnectNamedPipe, but not as frequently as the other branch, which always uses DisconnectNamedPipe.

As a result, replacing the mutex usage will take some thought and create incompatibilities that we probably don't want, but I'm not sure if they're avoidable while fixing the initial listener race.

@gdamore
Copy link
Contributor

gdamore commented Jul 18, 2018

The Named Pipe API is quite inconsistent. Full stop.

Thanks for the heads up about ERROR_PIPE_NOT_CONNECTED.

ERROR_WINDOWS_HAS_TOO_MANY_ERROR_CODES

@tankbusta
Copy link

I'm seeing some odd behavior that I think might be related to this? My listener doesn't seem to accept the first connection to the named pipe. If I restart my client, everything works and every connection is accepted as expected.

@jterry75
Copy link
Contributor

@jstarks - PTAL

@tankbusta
Copy link

tankbusta commented Oct 31, 2019

Yep @djsweet's PR is fixing the issue for me. I was curious and tried running all the pipe tests (on a fully patched Windows 7) from master and the vast majority of them are failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants