Skip to content

Deadlock with multiple simultaneous requests from the same client #72

@kevpar

Description

@kevpar

There is a possible deadlock in the TTRPC server/client interactions when there are multiple simultaneous requests from the same client connection. This causes both the server and client handler goroutines to deadlock.

I've repro'd this on both Linux (with unix sockets as the transport) and Windows (with both unix sockets and named pipes as the transport). It repros more easily when the transport has less buffering, and when there are more goroutines sending requests concurrently from the client.

I intend to look into how this can be fixed, but filing an issue for awareness and in case someone else wants to tackle it in the meantime. :)

Stacks

Server

goroutine 138 [IO wait]:
internal/poll.runtime_pollWait(0x7f63e0790f08, 0x77, 0xffffffffffffffff)
	/go1.14.3/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc00018c198, 0x77, 0x31000, 0x31012, 0xffffffffffffffff)
	/go1.14.3/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitWrite(...)
	/go1.14.3/src/internal/poll/fd_poll_runtime.go:96
internal/poll.(*FD).Write(0xc00018c180, 0xc0003e2ff6, 0x31012, 0x31012, 0x0, 0x0, 0x0)
	/go1.14.3/src/internal/poll/fd_unix.go:276 +0x290
net.(*netFD).Write(0xc00018c180, 0xc0003e2ff6, 0x31012, 0x31012, 0x1000, 0x0, 0x0)
	/go1.14.3/src/net/fd_unix.go:220 +0x4f
net.(*conn).Write(0xc000010018, 0xc0003e2ff6, 0x31012, 0x31012, 0x0, 0x0, 0x0)
	/go1.14.3/src/net/net.go:196 +0x8e
bufio.(*Writer).Write(0xc000090000, 0xc0003e2ff6, 0x31012, 0x31012, 0xa, 0xf00032008, 0x2)
	/go1.14.3/src/bufio/bufio.go:623 +0x13b
github.com/containerd/ttrpc.(*channel).send(0xc000090100, 0x20000000f, 0xc0003e2000, 0x32008, 0x32008, 0x0, 0x0)
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/channel.go:127 +0xbc
github.com/containerd/ttrpc.(*serverConn).run(0xc000094230, 0x6bf120, 0xc0000a2010)
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/server.go:459 +0x64b
created by github.com/containerd/ttrpc.(*Server).Serve
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/server.go:127 +0x288

goroutine 139 [select]:
github.com/containerd/ttrpc.(*serverConn).run.func1(0xc0001000c0, 0xc000094230, 0xc000100180, 0xc000090100, 0xc000100120, 0xc000076540)
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/server.go:404 +0x69b
created by github.com/containerd/ttrpc.(*serverConn).run
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/server.go:332 +0x2c0

Client

goroutine 19 [IO wait]:
internal/poll.runtime_pollWait(0x7fd97990df18, 0x77, 0xffffffffffffffff)
	/go1.14.3/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc0000f6098, 0x77, 0x31000, 0x31021, 0xffffffffffffffff)
	/go1.14.3/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitWrite(...)
	/go1.14.3/src/internal/poll/fd_poll_runtime.go:96
internal/poll.(*FD).Write(0xc0000f6080, 0xc0009ecff6, 0x31021, 0x31021, 0x0, 0x0, 0x0)
	/go1.14.3/src/internal/poll/fd_unix.go:276 +0x290
net.(*netFD).Write(0xc0000f6080, 0xc0009ecff6, 0x31021, 0x31021, 0x1000, 0x0, 0x0)
	/go1.14.3/src/net/fd_unix.go:220 +0x4f
net.(*conn).Write(0xc0000a8028, 0xc0009ecff6, 0x31021, 0x31021, 0x0, 0x0, 0x0)
	/go1.14.3/src/net/net.go:196 +0x8e
bufio.(*Writer).Write(0xc0000f8040, 0xc0009ecff6, 0x31021, 0x31021, 0xa, 0x1500032017, 0x1)
	/go1.14.3/src/bufio/bufio.go:623 +0x13b
github.com/containerd/ttrpc.(*channel).send(0xc0000f8080, 0x100000015, 0xc0009ec000, 0x32017, 0x32017, 0x0, 0x0)
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/channel.go:127 +0xbc
github.com/containerd/ttrpc.(*Client).send(0xc0000f6280, 0x100000015, 0x63b500, 0xc000022480, 0x1, 0x0)
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/client.go:324 +0x86
github.com/containerd/ttrpc.(*Client).run(0xc0000f6280)
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/client.go:273 +0x5ab
created by github.com/containerd/ttrpc.NewClient
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/client.go:94 +0x1bd

goroutine 6 [select]:
github.com/containerd/ttrpc.(*receiver).run(0xc00000e080, 0x6bf0e0, 0xc0000f8000, 0xc0000f8080)
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/client.go:222 +0x241
created by github.com/containerd/ttrpc.(*Client).run
	/go/pkg/mod/github.com/containerd/ttrpc@v1.0.2/client.go:262 +0x1fa

Analysis

Basically, the server has a "receiver" goroutine that receives new requests from the transport, and sends a message via channel to the "worker" goroutine. The "worker" goroutine has a loop and a select to handle either a new request message, or a response that needs to be sent to the client. When the deadlock occurs, the server is stuck blocking on a response write to the transport from the "worker" goroutine, while the "receiver" goroutine is stuck trying to send a request message to the "worker" goroutine.

The client side is basically the inverse of this, where the "receiver" goroutine is stuck trying to send a response message received on the transport to the "worker" goroutine via channel. The "worker" goroutine is likewise stuck trying to send a new request to the server via the transport.

This looks like it should only occur when the connection is busy enough that the transport buffer is filled, as otherwise the server and client writes to the transport would simply be fulfilled by the buffer, and would not block waiting for a reader on the other end.

The interesting places in the code where the 4 goroutines are stuck are linked below:
Server receiver sending message to worker: https://github.com/containerd/ttrpc/blob/v1.0.2/server.go#L404
Server worker writing response to transport: https://github.com/containerd/ttrpc/blob/v1.0.2/server.go#L459
Client receiver sending message to worker: https://github.com/containerd/ttrpc/blob/v1.0.2/client.go#L222
Client worker writing request to transport: https://github.com/containerd/ttrpc/blob/v1.0.2/client.go#L273

Sample

I have a repro program here. This program can be run as either a server (go run . server) or client (go run . client). The server implements a very simple TTRPC server that listens for connections. The client spawns multiple goroutines to constantly send requests to the server and print their ID each time they get a response. Each request/response has a bunch of junk data added to the message to try to avoid the affects of buffering on the underlying transport. When run, you will generally see a little bit of output from the client, but then it will stop when the deadlock occurs. You can also hit enter on either the server or client to cause them to dump their current goroutine stacks to a file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions