Skip to content

IsRetryableStatusCode() includes HTTP 500 contrary to documented default retry behavior #2523

@cedric-moser-cistec

Description

@cedric-moser-cistec

Component(s)

router

Component version

0c98cd0

wgc version

0.105.3

controlplane version

latest

router version

0.243.0

What happened?

Documentation misalignment: IsRetryableStatusCode() includes HTTP 500 but introduction states default retry on 502, 503, 504 only

Description

The retry documentation contradicts itself on the same page regarding which HTTP status codes are retried by default.

Introduction (top of page):

"By default, the router retries GraphQL operations of type query on specific network errors and HTTP status codes (502, 503, 504)."

500 is not listed.

IsRetryableStatusCode() reference (same page, further down):

Returns true if the status code is one of: 500, 502, 503, 504

500 is listed.

Since the default expression is IsRetryableStatusCode() || IsConnectionError() || IsTimeout(), the actual behavior retries on 500 — contradicting the introduction.

Steps to Reproduce

  1. Open the retry documentation.
  2. Introduction states default retry on: 502, 503, 504 (no 500).
  3. IsRetryableStatusCode() definition includes: 500, 502, 503, 504.
  4. Default expression uses IsRetryableStatusCode() → 500 is retried despite the introduction saying otherwise.

Expected Result

The introduction and IsRetryableStatusCode() should agree. Either:

  1. Remove 500 from IsRetryableStatusCode() or
  2. Update the introduction to include 500

Personal opinion: prefer removing 500 from the default

I would lean towards removing 500 from the default retryable set. Subgraphs experiencing transient infrastructure failures (DB connection timeout, downstream service unavailable) should return 503 Service Unavailable — the router already retries 503 by default. A 500 typically signals an application-level error (unhandled exception, bug) where retrying masks issues that should be surfaced immediately.

Users who need to retry on 500 can opt in explicitly:

retry:
  expression: "statusCode == 500 || IsRetryableStatusCode() || IsConnectionError() || IsTimeout()"

Actual Result

The current implementation follows IsRetryableStatusCode() and does retry on 500 by default. The introduction on the same documentation page incorrectly states that only 502, 503, and 504 are retried.

  • Implementation retries on: 500, 502, 503, 504 (Assumed code reference)
  • Introduction claims default retry on: 502, 503, 504

Users reading the introduction will believe 500 is not retried, while the actual router behavior retries it.

Environment information

Environment

OS: Ubuntu

Router configuration

Router execution config

Log output

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions