Skip to content

Commit

Permalink
Fix deadlock in closeWithError
Browse files Browse the repository at this point in the history
We have seen goroutines stuck at:

    select
    github.com/gocql/gocql.(*Conn).closeWithError
      /go/pkg/mod/github.com/kiwicom/gocql@v1.8.0/conn.go:569
    github.com/gocql/gocql.(*Conn).exec
      /go/pkg/mod/github.com/kiwicom/gocql@v1.8.0/conn.go:1113
    github.com/gocql/gocql.(*Conn).executeQuery
      /go/pkg/mod/github.com/kiwicom/gocql@v1.8.0/conn.go:1414

    semacquire
    sync.runtime_SemacquireMutex
      /usr/local/go/src/runtime/sema.go:71
    sync.(*Mutex).lockSlow
      /usr/local/go/src/sync/mutex.go:138
    sync.(*Mutex).Lock
      /usr/local/go/src/sync/mutex.go:81
    github.com/gocql/gocql.(*Conn).exec
      /go/pkg/mod/github.com/kiwicom/gocql@v1.8.0/conn.go:1058
    github.com/gocql/gocql.(*Conn).executeQuery
      /go/pkg/mod/github.com/kiwicom/gocql@v1.8.0/conn.go:1414
    github.com/gocql/gocql.(*Query).execute

When closeWithError is notifying callReqs about the error, it
selects between writing to call.resp or until call.timeout is closed.
There were branches leading to return with missing close(call.timeout)
calls in exec. When exec returned without either closing call.timeout or
reading the call.resp, this left the call without any further progress.
closeWithError selected between two conditions that never happened.
Rate limit · GitHub

Whoa there!

You have triggered an abuse detection mechanism.

Please wait a few minutes before you try again;
in some cases this may take up to an hour.

martin-sucha committed Mar 18, 2022
1 parent 2edce58 commit 4e5ba55
Showing 1 changed file with 12 additions and 4 deletions.
16 changes: 12 additions & 4 deletions conn.go
Original file line number Diff line number Diff line change
@@ -1069,6 +1069,10 @@ func (c *Conn) exec(ctx context.Context, req frameBuilder, tracer Tracer) (*fram
return nil, err
}

// After this point, we need to either read from call.resp or close(call.timeout)
// since closeWithError can try to write a connection close error to call.resp.
// If we don't close(call.timeout) or read from call.resp, closeWithError can deadlock.

if tracer != nil {
framer.trace()
}
@@ -1081,6 +1085,9 @@ func (c *Conn) exec(ctx context.Context, req frameBuilder, tracer Tracer) (*fram

err := req.buildFrame(framer, stream)
if err != nil {
// closeWithError will block waiting for this stream to either receive a response
// or for us to timeout.
close(call.timeout)
// We failed to serialize the frame into a buffer.
// This should not affect the connection as we didn't write anything. We just free the current call.
c.mu.Lock()
@@ -1096,6 +1103,10 @@ func (c *Conn) exec(ctx context.Context, req frameBuilder, tracer Tracer) (*fram

n, err := c.w.writeContext(ctx, framer.buf)
if err != nil {
// closeWithError will block waiting for this stream to either receive a response
// or for us to timeout, close the timeout chan here. Im not entirely sure
// but we should not get a response after an error on the write side.
close(call.timeout)
if (errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded)) && n == 0 {
// We have not started to write this frame.
// Release the stream as no response can come from the server on the stream.
@@ -1108,10 +1119,6 @@ func (c *Conn) exec(ctx context.Context, req frameBuilder, tracer Tracer) (*fram
// check above could fail.
c.releaseStream(call)
} else {
// closeWithError will block waiting for this stream to either receive a response
// or for us to timeout, close the timeout chan here. Im not entirely sure
// but we should not get a response after an error on the write side.
close(call.timeout)
// I think this is the correct thing to do, im not entirely sure. It is not
// ideal as readers might still get some data, but they probably wont.
// Here we need to be careful as the stream is not available and if all
@@ -1179,6 +1186,7 @@ func (c *Conn) exec(ctx context.Context, req frameBuilder, tracer Tracer) (*fram
close(call.timeout)
return nil, ctx.Err()
case <-c.ctx.Done():
close(call.timeout)
return nil, ErrConnectionClosed
}
}

0 comments on commit 4e5ba55

Please sign in to comment.