Skip to content

connect_timeout is not obeyed for sslmode=allow|prefer #1672

Closed
@smaher-edb

Description

@smaher-edb

Describe the bug
connect_timeout given in conn string is not obeyed if sslmode is not specified or equals sslmode=allow|prefer. It takes twice the amount specified by connect_timeout in conn string. While this behavior is correct if multi-host is provided in conn string, it doesn't look correct in case of single host. This behavior is also not matching with libpq (example given below).

To Reproduce
Steps to reproduce the behavior:

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/jackc/pgx/v4"
)

func main() {
	// invalid port is provided to reproduce the connect timeout behavior here
	urlExample := "host=3.110.94.120 port=7432 dbname=mydb user=postgres connect_timeout=2"
	conn, err := pgx.Connect(context.Background(), urlExample)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Unable to connect to database: %v\n", err)
		os.Exit(1)
	}
	defer conn.Close(context.Background())

	var name string
	err = conn.QueryRow(context.Background(), "select 'xyz'").Scan(&name)
	if err != nil {
		fmt.Fprintf(os.Stderr, "QueryRow failed: %v\n", err)
		os.Exit(1)
	}

	fmt.Println(name)
}

Output

Without sslmode it takes 4s instead of 2s.

$ date; go run cmd/main.go; date
Wed Jul  5 13:17:00 IST 2023
Unable to connect to database: failed to connect to `host=3.110.94.120 user=postgres database=mydb`: dial error (timeout: dial tcp 3.110.94.120:7432: i/o timeout)
exit status 1
Wed Jul  5 13:17:04 IST 2023

If sslmode=require is provided in connection string it correctly takes 2s.

$ date; go run cmd/main.go; date
Wed Jul  5 13:17:31 IST 2023
Unable to connect to database: failed to connect to `host=3.110.94.120 user=postgres database=mydb`: dial error (timeout: dial tcp 3.110.94.120:7432: i/o timeout)
exit status 1
Wed Jul  5 13:17:33 IST 2023

Expected behavior
Expectation is to match the behavior of libpq/psql.

$ date; psql -d "host=3.110.94.120 port=7432 dbname=mydb user=postgres connect_timeout=2"; date
Wed Jul  5 08:09:03 UTC 2023
psql: error: connection to server at "3.110.94.120", port 7432 failed: timeout expired
Wed Jul  5 08:09:05 UTC 2023

Actual behavior
When connection string has single host and no sslmode specified (or sslmode=allow or prefer) then connect_timeout given in connection string should be obeyed. Currently, it is taking two times that value. It seems to be happening because to implement these 2 sslmode, code correctly tries to connect to the server first with TLSConfig and if it fails then without TLSConfig. However, while doing this connect_timeout should still be obeyed. Otherwise it increases the actual timeout by multiple of 2. (most of the time when timeout error occurs it is expected to fail with or without TLSConfig)

In case of multi-host (-d "host=host1,host2 ..."), this behavior is correct as it is also given here.

For example, if you specify two hosts and connect_timeout is 5, each host will time out if no connection is made within 5 seconds, so the total time spent waiting for a connection might be up to 10 seconds.

However, it doesn't look ok if single host is provided and sslmode is not specified (default is prefer). libpq behavior in this case is also different than multi-host behavior (example given above).

Version

  • Go: go version go1.20.4 darwin/amd64
  • PostgreSQL: PostgreSQL 14.8 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
  • pgx: require github.com/jackc/pgx/v4 v4.18.11

Additional context
I think this behavior got introduced because the code path for supporting multi-host as well as sslmode=allow|prefer is same. Both of them seems to be using fallbackConfig.

With following temporary fix it seems to be working fine. However, not sure whether it is the correct way to do it.

diff --git a/pgconn.go b/pgconn.go
--- pgconn.go
+++ pgconn.go
@@ -155,14 +155,17 @@
 	}
 
 	foundBestServer := false
 	var fallbackConfig *FallbackConfig
-	for _, fc := range fallbackConfigs {
+	for i, fc := range fallbackConfigs {
 		// ConnectTimeout restricts the whole connection process.
 		if config.ConnectTimeout != 0 {
-			var cancel context.CancelFunc
-			ctx, cancel = context.WithTimeout(octx, config.ConnectTimeout)
-			defer cancel()
+			// create new context first time or when previous host was different
+			if i == 0 || (fallbackConfigs[i].Host != fallbackConfigs[i-1].Host) {
+				var cancel context.CancelFunc
+				ctx, cancel = context.WithTimeout(octx, config.ConnectTimeout)
+				defer cancel()
+			}
 		} else {
 			ctx = octx
 		}
 		pgConn, err = connect(ctx, config, fc, false)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions