Add read_timeout and write_timeout #498

pneisen · 2016-09-08T21:20:48Z

I believe this addresses issue #450.

While doing some failure testing connected to an AWS HA RDS cluster, we noticed that our connections would hang on a net.(*conn).Read forever. The Postgres server was no longer responding so, statement_timeout had no effect. The stack trace for the affected goroutines looked like this:

#    0x43d4e8    net.runtime_pollWait+0x58                                    /usr/local/go/src/runtime/netpoll.go:160
#    0x5654f7    net.(*pollDesc).wait+0x37                                    /usr/local/go/src/net/fd_poll_runtime.go:73
#    0x565563    net.(*pollDesc).waitRead+0x33                                    /usr/local/go/src/net/fd_poll_runtime.go:78
#    0x566c00    net.(*netFD).Read+0x1a0                                        /usr/local/go/src/net/fd_unix.go:243
#    0x571d8f    net.(*conn).Read+0x6f                                        /usr/local/go/src/net/net.go:173
#    0x608490    crypto/tls.(*block).readFromUntil+0x90                                /usr/local/go/src/crypto/tls/conn.go:472
#    0x6089d3    crypto/tls.(*Conn).readRecord+0xc3                                /usr/local/go/src/crypto/tls/conn.go:574
#    0x60c805    crypto/tls.(*Conn).Read+0x115                                    /usr/local/go/src/crypto/tls/conn.go:1109
#    0x637b2b    bufio.(*Reader).fill+0x10b                                    /usr/local/go/src/bufio/bufio.go:97
#    0x63818b    bufio.(*Reader).Read+0x1bb                                    /usr/local/go/src/bufio/bufio.go:209
#    0x5e5393    io.ReadAtLeast+0xa3                                        /usr/local/go/src/io/io.go:307
#    0x5e54f7    io.ReadFull+0x57                                        /usr/local/go/src/io/io.go:325
#    0x591b06    myproject/vendor/github.com/lib/pq.(*conn).recvMessage+0x116    /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:892
#    0x591e28    myproject/vendor/github.com/lib/pq.(*conn).recv1Buf+0x38        /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:942
#    0x591f36    myproject/vendor/github.com/lib/pq.(*conn).recv1+0x86        /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:963
#    0x59981e    myproject/vendor/github.com/lib/pq.(*conn).readParseResponse+0x2e    /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:1594
#    0x59056b    myproject/vendor/github.com/lib/pq.(*conn).prepareTo+0x63b        /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:732
#    0x5912b5    myproject/vendor/github.com/lib/pq.(*conn).Query+0x3d5        /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:790
#    0x5406e6    database/sql.(*DB).queryConn+0x5d6                                /usr/local/go/src/database/sql/sql.go:1092
#    0x54007f    database/sql.(*DB).query+0xef                                    /usr/local/go/src/database/sql/sql.go:1079
#    0x53fdcf    database/sql.(*DB).Query+0x8f                                    /usr/local/go/src/database/sql/sql.go:1062
#    0x5409bf    database/sql.(*DB).QueryRow+0x6f                                /usr/local/go/src/database/sql/sql.go:1143

Since the connect_timeout only applies to the initial connection before it is placed in the pool, and statement_timeout is a Postgres thing that doesn't apply when Postgres is down, we decided we wanted more timeout options. So, I added read_timeout and write_timeout (in milliseconds) to the connection string parameters and use them to set deadlines on reads and writes to the database connection. If a read or write returns before the deadline, then the deadline is cleared so the connection can be returned to the pool. If the deadline is reached an error is returned and eventually the connection gets marked as bad.

This fixed the issue we were seeing, but we did notice that the old, hung, established connections were still around in netstat. Looking at pq.(*conn).Close(), when the sql layer called to close the bad connection, the code would just return driver.ErrBadConn and not try to close the connection. I changed that to close the connection and return the result of the close call, similar to what is done at the bottom of that function. This seemed to do the trick and the hung connections were cleaned up.

Any feedback on these changes would be greatly appreciated. We have tested them in our fork as part of our applications failure testing and have had good results.

Added read_timeout and write_timeout (in milliseconds) to the connection string parameters and use them on reads and writes to the database connection.

johto · 2016-09-08T21:53:01Z

conn.go

@@ -132,6 +137,13 @@ func (c *conn) handleDriverSettings(o values) (err error) {
 		return nil
 	}

+	intSetting := func(key string) (int, error) {


Why the different interface compared to boolSetting()?

johto · 2016-09-08T21:58:22Z

I think this feature is inherently misguided, but since I'm not going to be working on anything better and it can work for some types of applications, I'm not going to object.

There's two patches pretending to be one here, though, and I had some comments on both of them.

cbandy · 2016-09-08T22:15:41Z

It looks to me like libpq punts on this by letting the user set things using PQsocket().

Our equivalent is the Dialer. Should the user be doing this in the Read and Write of their own net.Conn?

pneisen · 2016-09-09T01:25:43Z

That is an interesting thought about creating a custom Dialer. I think over all that would be a cleaner way to do it and it I could take care of both of the issues we were seeing. I'll give it an attempt and get back with you guys.

Thanks for the feedback!

pneisen · 2016-09-12T18:13:19Z

I went down the road of creating a wrapper around lib/pq that adds a custom dialer when Open() is called: https://github.com/Kount/pq-timeouts

The wrapper is cleaner and thanks for the suggestion. I'm going to close this pull request. I still think Close() not attempting to close a connection that is marked bad is an issue, but I will issue another pull request for that.

Thanks.

See https://github.com/Kount/pq-timeouts/ and lib/pq#498

Add read write timeouts (#1)

026e435

Added read_timeout and write_timeout (in milliseconds) to the connection string parameters and use them on reads and writes to the database connection.

johto reviewed Sep 8, 2016
View reviewed changes

pneisen closed this Sep 12, 2016

jmoiron added a commit to jmoiron/sqlx that referenced this pull request Jan 24, 2018

add support for postgres lib w/ timeouts

05cef07

See https://github.com/Kount/pq-timeouts/ and lib/pq#498

ebenoist pushed a commit to reverbdotcom/sqlx that referenced this pull request May 11, 2021

add support for postgres lib w/ timeouts

5bb9834

See https://github.com/Kount/pq-timeouts/ and lib/pq#498

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add read_timeout and write_timeout #498

Add read_timeout and write_timeout #498

Uh oh!

pneisen commented Sep 8, 2016

Uh oh!

johto Sep 8, 2016

Uh oh!

johto commented Sep 8, 2016

Uh oh!

cbandy commented Sep 8, 2016

Uh oh!

pneisen commented Sep 9, 2016

Uh oh!

pneisen commented Sep 12, 2016

Uh oh!

Uh oh!

Add read_timeout and write_timeout #498

Add read_timeout and write_timeout #498

Uh oh!

Conversation

pneisen commented Sep 8, 2016

Uh oh!

johto Sep 8, 2016

Choose a reason for hiding this comment

Uh oh!

johto commented Sep 8, 2016

Uh oh!

cbandy commented Sep 8, 2016

Uh oh!

pneisen commented Sep 9, 2016

Uh oh!

pneisen commented Sep 12, 2016

Uh oh!

Uh oh!