Description
Expected Behavior
When a context deadline is provided, and all backends are down, calls to ClusterClient.Get (and others) return shortly by the deadline or shortly after, with an error.
Current Behavior
Calls to ClusterClient.Get block for N * D, where N is the number of backends configured, and D is the DialTimeout.
Steps to Reproduce
package main
import (
"context"
"log"
"time"
"github.com/go-redis/redis/v8"
)
func main() {
rdb := redis.NewClusterClient(&redis.ClusterOptions{
Addrs: []string{
"203.0.113.91:999",
"203.0.113.92:999",
"203.0.113.93:999",
"203.0.113.94:999",
"203.0.113.95:999",
"203.0.113.96:999",
},
DialTimeout: time.Second,
ReadTimeout: time.Second,
WriteTimeout: time.Second,
})
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
resp, err := rdb.Get(ctx, "example").Result()
if err != nil {
log.Fatal(err)
}
log.Print(resp)
}
$ time go run main.go
dial error: dial tcp 203.0.113.96:999: i/o timeout
dial error: dial tcp 203.0.113.95:999: i/o timeout
dial error: dial tcp 203.0.113.93:999: i/o timeout
dial error: dial tcp 203.0.113.94:999: i/o timeout
dial error: dial tcp 203.0.113.96:999: i/o timeout
dial error: dial tcp 203.0.113.91:999: i/o timeout
2022/07/06 11:31:31 context deadline exceeded
exit status 1
real 0m6.532s
user 0m0.925s
sys 0m0.281s
Expected: real
time is less than 500ms, based on the 100ms context timeout. Actual: real
time is 6.5s.
Note: the dial error
s are present because I added a Println call to go-redis' dial code to make it clearer what's going on.
Context (Environment)
go version go1.18 linux/amd64
github.com/go-redis/redis/v8 v8.11.5
This is a simplified version of a problem we encountered when bringing up the OCSP responder for https://github.com/letsencrypt/boulder/ with a set of backends that could not be reached.
Possible Implementation
It's reasonable to keep trying to dial in the background, but the user-facing call should return based on the deadline.