Description
Describe the bug
connect_timeout
given in conn string is not obeyed if sslmode
is not specified or equals sslmode=allow|prefer
. It takes twice the amount specified by connect_timeout
in conn string. While this behavior is correct if multi-host
is provided in conn string, it doesn't look correct in case of single host. This behavior is also not matching with libpq (example given below).
To Reproduce
Steps to reproduce the behavior:
package main
import (
"context"
"fmt"
"os"
"github.com/jackc/pgx/v4"
)
func main() {
// invalid port is provided to reproduce the connect timeout behavior here
urlExample := "host=3.110.94.120 port=7432 dbname=mydb user=postgres connect_timeout=2"
conn, err := pgx.Connect(context.Background(), urlExample)
if err != nil {
fmt.Fprintf(os.Stderr, "Unable to connect to database: %v\n", err)
os.Exit(1)
}
defer conn.Close(context.Background())
var name string
err = conn.QueryRow(context.Background(), "select 'xyz'").Scan(&name)
if err != nil {
fmt.Fprintf(os.Stderr, "QueryRow failed: %v\n", err)
os.Exit(1)
}
fmt.Println(name)
}
Output
Without sslmode
it takes 4s
instead of 2s
.
$ date; go run cmd/main.go; date
Wed Jul 5 13:17:00 IST 2023
Unable to connect to database: failed to connect to `host=3.110.94.120 user=postgres database=mydb`: dial error (timeout: dial tcp 3.110.94.120:7432: i/o timeout)
exit status 1
Wed Jul 5 13:17:04 IST 2023
If sslmode=require
is provided in connection string it correctly takes 2s
.
$ date; go run cmd/main.go; date
Wed Jul 5 13:17:31 IST 2023
Unable to connect to database: failed to connect to `host=3.110.94.120 user=postgres database=mydb`: dial error (timeout: dial tcp 3.110.94.120:7432: i/o timeout)
exit status 1
Wed Jul 5 13:17:33 IST 2023
Expected behavior
Expectation is to match the behavior of libpq/psql
.
$ date; psql -d "host=3.110.94.120 port=7432 dbname=mydb user=postgres connect_timeout=2"; date
Wed Jul 5 08:09:03 UTC 2023
psql: error: connection to server at "3.110.94.120", port 7432 failed: timeout expired
Wed Jul 5 08:09:05 UTC 2023
Actual behavior
When connection string has single host and no sslmode
specified (or sslmode=allow
or prefer
) then connect_timeout
given in connection string should be obeyed. Currently, it is taking two times that value. It seems to be happening because to implement these 2 sslmode, code correctly tries to connect to the server first with TLSConfig
and if it fails then without TLSConfig
. However, while doing this connect_timeout
should still be obeyed. Otherwise it increases the actual timeout by multiple of 2. (most of the time when timeout error occurs it is expected to fail with or without TLSConfig)
In case of multi-host
(-d "host=host1,host2 ..."
), this behavior is correct as it is also given here.
For example, if you specify two hosts and connect_timeout is 5, each host will time out if no connection is made within 5 seconds, so the total time spent waiting for a connection might be up to 10 seconds.
However, it doesn't look ok if single host is provided and sslmode
is not specified (default is prefer
). libpq
behavior in this case is also different than multi-host
behavior (example given above).
Version
- Go:
go version go1.20.4 darwin/amd64
- PostgreSQL:
PostgreSQL 14.8 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
- pgx:
require github.com/jackc/pgx/v4 v4.18.11
Additional context
I think this behavior got introduced because the code path for supporting multi-host
as well as sslmode=allow|prefer
is same. Both of them seems to be using fallbackConfig
.
With following temporary fix it seems to be working fine. However, not sure whether it is the correct way to do it.
diff --git a/pgconn.go b/pgconn.go
--- pgconn.go
+++ pgconn.go
@@ -155,14 +155,17 @@
}
foundBestServer := false
var fallbackConfig *FallbackConfig
- for _, fc := range fallbackConfigs {
+ for i, fc := range fallbackConfigs {
// ConnectTimeout restricts the whole connection process.
if config.ConnectTimeout != 0 {
- var cancel context.CancelFunc
- ctx, cancel = context.WithTimeout(octx, config.ConnectTimeout)
- defer cancel()
+ // create new context first time or when previous host was different
+ if i == 0 || (fallbackConfigs[i].Host != fallbackConfigs[i-1].Host) {
+ var cancel context.CancelFunc
+ ctx, cancel = context.WithTimeout(octx, config.ConnectTimeout)
+ defer cancel()
+ }
} else {
ctx = octx
}
pgConn, err = connect(ctx, config, fc, false)