Skip to content

context deadline not respected if backends are down #2145

Open
@jsha

Description

@jsha

Expected Behavior

When a context deadline is provided, and all backends are down, calls to ClusterClient.Get (and others) return shortly by the deadline or shortly after, with an error.

Current Behavior

Calls to ClusterClient.Get block for N * D, where N is the number of backends configured, and D is the DialTimeout.

Steps to Reproduce

package main

import (
	"context"
	"log"
	"time"

	"github.com/go-redis/redis/v8"
)

func main() {
	rdb := redis.NewClusterClient(&redis.ClusterOptions{
		Addrs: []string{
			"203.0.113.91:999",
			"203.0.113.92:999",
			"203.0.113.93:999",
			"203.0.113.94:999",
			"203.0.113.95:999",
			"203.0.113.96:999",
		},
		DialTimeout:  time.Second,
		ReadTimeout:  time.Second,
		WriteTimeout: time.Second,
	})
	ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
	defer cancel()
	resp, err := rdb.Get(ctx, "example").Result()
	if err != nil {
		log.Fatal(err)
	}
	log.Print(resp)
}
$ time go run main.go
dial error: dial tcp 203.0.113.96:999: i/o timeout
dial error: dial tcp 203.0.113.95:999: i/o timeout
dial error: dial tcp 203.0.113.93:999: i/o timeout
dial error: dial tcp 203.0.113.94:999: i/o timeout
dial error: dial tcp 203.0.113.96:999: i/o timeout
dial error: dial tcp 203.0.113.91:999: i/o timeout
2022/07/06 11:31:31 context deadline exceeded
exit status 1

real    0m6.532s
user    0m0.925s
sys     0m0.281s

Expected: real time is less than 500ms, based on the 100ms context timeout. Actual: real time is 6.5s.

Note: the dial errors are present because I added a Println call to go-redis' dial code to make it clearer what's going on.

Context (Environment)

go version go1.18 linux/amd64
github.com/go-redis/redis/v8 v8.11.5

This is a simplified version of a problem we encountered when bringing up the OCSP responder for https://github.com/letsencrypt/boulder/ with a set of backends that could not be reached.

Possible Implementation

It's reasonable to keep trying to dial in the background, but the user-facing call should return based on the deadline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions