Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection health checks completely ineffective for TLS connections #3025

Closed
nanyan0312 opened this issue Jun 12, 2024 · 2 comments · Fixed by #3047
Closed

Connection health checks completely ineffective for TLS connections #3025

nanyan0312 opened this issue Jun 12, 2024 · 2 comments · Fixed by #3047

Comments

@nanyan0312
Copy link

This is to report a bug in the health check logic for TLS connections. Specifically, in the connCheck function in the internal/pool/conn_check.go file here. It leads to unintentional exhaustions of retry count and ultimately command failures, in the presence of server-side disconnections.

go-redis with non-TLS does not have this problem.

Expected Behavior

The intended use case of connection health check, when picking connections from pool, a health check is made. Bad connections are thrown away immediately. The code keeps picking until a good connection is found. If all pooled connections are bad, a new connection is made. Throwing away a bad connection does not consume retry count. Only when a error happened when using a picked connection to send a command, that error would consume a retry count to be retried.

Current Behavior

The specific bug is, when using TLS, the input argument of the connCheck function is of tls.Conn type. tls.Conn type does not implement the syscall.Conn interface. As result, the type conversion here always returns ok being false therefore bypassing connection health check entirely for TLS connections. Bad connections in the connection pool are used to send commands, resulting in errors. Every bad connection consumes a retry count.

Possible Solution

Steps to Reproduce

  1. Set up client to use TLS. With 20 pool size, and 4 retry count. But this issue will be exposed as long as the retry count is lower than the pool size.
        rdb := redis.NewClusterClient(&redis.ClusterOptions{
                Addrs:        []string{""},
                Password:     "",
                PoolSize:     20,
                PoolFIFO:     false,
                MinIdleConns: 10,

                MaxRetries:      4,
                MinRetryBackoff: 8 * time.Millisecond,
                MaxRetryBackoff: 512 * time.Millisecond,

                TLSConfig: &tls.Config{
                        InsecureSkipVerify: true,
                        ServerName:         "you domain",
                },

        })
  1. Use client kill type normal on Redis to kill all existing connections all at once.
  2. Observe commands failures on client side.

Context (Environment)

Many cloud services hosting Redis offers managed replacements of instances, during which connections on the old instance are killed in batch. Due to this bug, it results in commands failures for TLS clusters, but not non-TLS clusters.

@monkey92t monkey92t linked a pull request Jul 12, 2024 that will close this issue
monkey92t pushed a commit that referenced this issue Jul 12, 2024
* add a check for TLS connections.
@monkey92t
Copy link
Collaborator

new version v9.5.4

vladvildanov pushed a commit that referenced this issue Jul 17, 2024
* add a check for TLS connections.
@rkarthikr
Copy link

new version v9.5.4

Did this go in 9.6.0 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants