Skip to content

Clarification and/or configurability of what it means for a server to be dead/down #7290

@shinrich

Description

@shinrich

While working with @djcarlin and @bryancall to understand an issue from deploying the dead server no retry feature added in PR #7142, we were surprised by when exactly servers are marked dead.

In the case where transactions are retryable, things pretty much worked as expected. If the handshake failed or the request is sent but the origin fails to return data and is retryable, the address is tried the number of times specified in proxy.config.http.connect_attempts_rr_retries (at least once PR #7288 is applied) before marking the IP address as down and moving onto the next IP address.

However, if the transaction failed after sending the header and it is not retryable (e.g. a POST request), the ip address is marked down immediately (the retry count in proxy.config.http.connect_attempts_rr_retries is ignored). If the origin only times out now and again due to larger requests, taking it down immediately seems bad particularly using the new feature that avoids the retries against the down server in the dead period. However, if the server is consistently failing to respond to post requests, it should be marked down.

Probably this down decision criteria needs to be configurable. Some origins need different criteria than others. Some should only be marked down in the initial handshake fails. Others should be marked down but only if no data was returned. Or maybe you want to mark things down only for specific origin connection failures.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions