Skip to content

Add read_timeout and write_timeout #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add read_timeout and write_timeout #498

wants to merge 1 commit into from

Conversation

pneisen
Copy link

@pneisen pneisen commented Sep 8, 2016

I believe this addresses issue #450.

While doing some failure testing connected to an AWS HA RDS cluster, we noticed that our connections would hang on a net.(*conn).Read forever. The Postgres server was no longer responding so, statement_timeout had no effect. The stack trace for the affected goroutines looked like this:

#    0x43d4e8    net.runtime_pollWait+0x58                                    /usr/local/go/src/runtime/netpoll.go:160
#    0x5654f7    net.(*pollDesc).wait+0x37                                    /usr/local/go/src/net/fd_poll_runtime.go:73
#    0x565563    net.(*pollDesc).waitRead+0x33                                    /usr/local/go/src/net/fd_poll_runtime.go:78
#    0x566c00    net.(*netFD).Read+0x1a0                                        /usr/local/go/src/net/fd_unix.go:243
#    0x571d8f    net.(*conn).Read+0x6f                                        /usr/local/go/src/net/net.go:173
#    0x608490    crypto/tls.(*block).readFromUntil+0x90                                /usr/local/go/src/crypto/tls/conn.go:472
#    0x6089d3    crypto/tls.(*Conn).readRecord+0xc3                                /usr/local/go/src/crypto/tls/conn.go:574
#    0x60c805    crypto/tls.(*Conn).Read+0x115                                    /usr/local/go/src/crypto/tls/conn.go:1109
#    0x637b2b    bufio.(*Reader).fill+0x10b                                    /usr/local/go/src/bufio/bufio.go:97
#    0x63818b    bufio.(*Reader).Read+0x1bb                                    /usr/local/go/src/bufio/bufio.go:209
#    0x5e5393    io.ReadAtLeast+0xa3                                        /usr/local/go/src/io/io.go:307
#    0x5e54f7    io.ReadFull+0x57                                        /usr/local/go/src/io/io.go:325
#    0x591b06    myproject/vendor/github.com/lib/pq.(*conn).recvMessage+0x116    /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:892
#    0x591e28    myproject/vendor/github.com/lib/pq.(*conn).recv1Buf+0x38        /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:942
#    0x591f36    myproject/vendor/github.com/lib/pq.(*conn).recv1+0x86        /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:963
#    0x59981e    myproject/vendor/github.com/lib/pq.(*conn).readParseResponse+0x2e    /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:1594
#    0x59056b    myproject/vendor/github.com/lib/pq.(*conn).prepareTo+0x63b        /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:732
#    0x5912b5    myproject/vendor/github.com/lib/pq.(*conn).Query+0x3d5        /home/pneisen/myproject/vendor/github.com/lib/pq/conn.go:790
#    0x5406e6    database/sql.(*DB).queryConn+0x5d6                                /usr/local/go/src/database/sql/sql.go:1092
#    0x54007f    database/sql.(*DB).query+0xef                                    /usr/local/go/src/database/sql/sql.go:1079
#    0x53fdcf    database/sql.(*DB).Query+0x8f                                    /usr/local/go/src/database/sql/sql.go:1062
#    0x5409bf    database/sql.(*DB).QueryRow+0x6f                                /usr/local/go/src/database/sql/sql.go:1143

Since the connect_timeout only applies to the initial connection before it is placed in the pool, and statement_timeout is a Postgres thing that doesn't apply when Postgres is down, we decided we wanted more timeout options. So, I added read_timeout and write_timeout (in milliseconds) to the connection string parameters and use them to set deadlines on reads and writes to the database connection. If a read or write returns before the deadline, then the deadline is cleared so the connection can be returned to the pool. If the deadline is reached an error is returned and eventually the connection gets marked as bad.

This fixed the issue we were seeing, but we did notice that the old, hung, established connections were still around in netstat. Looking at pq.(*conn).Close(), when the sql layer called to close the bad connection, the code would just return driver.ErrBadConn and not try to close the connection. I changed that to close the connection and return the result of the close call, similar to what is done at the bottom of that function. This seemed to do the trick and the hung connections were cleaned up.

Any feedback on these changes would be greatly appreciated. We have tested them in our fork as part of our applications failure testing and have had good results.

Added read_timeout and write_timeout (in milliseconds) to the connection string parameters and use them on reads and writes to the database connection.
@@ -132,6 +137,13 @@ func (c *conn) handleDriverSettings(o values) (err error) {
return nil
}

intSetting := func(key string) (int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the different interface compared to boolSetting()?

@johto
Copy link
Contributor

johto commented Sep 8, 2016

I think this feature is inherently misguided, but since I'm not going to be working on anything better and it can work for some types of applications, I'm not going to object.

There's two patches pretending to be one here, though, and I had some comments on both of them.

@cbandy
Copy link
Contributor

cbandy commented Sep 8, 2016

It looks to me like libpq punts on this by letting the user set things using PQsocket().

Our equivalent is the Dialer. Should the user be doing this in the Read and Write of their own net.Conn?

@pneisen
Copy link
Author

pneisen commented Sep 9, 2016

That is an interesting thought about creating a custom Dialer. I think over all that would be a cleaner way to do it and it I could take care of both of the issues we were seeing. I'll give it an attempt and get back with you guys.

Thanks for the feedback!

@pneisen
Copy link
Author

pneisen commented Sep 12, 2016

I went down the road of creating a wrapper around lib/pq that adds a custom dialer when Open() is called: https://github.com/Kount/pq-timeouts

The wrapper is cleaner and thanks for the suggestion. I'm going to close this pull request. I still think Close() not attempting to close a connection that is marked bad is an issue, but I will issue another pull request for that.

Thanks.

@pneisen pneisen closed this Sep 12, 2016
jmoiron added a commit to jmoiron/sqlx that referenced this pull request Jan 24, 2018
ebenoist pushed a commit to reverbdotcom/sqlx that referenced this pull request May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants