Skip to content

Conversation

rhamzeh
Copy link
Member

@rhamzeh rhamzeh commented Oct 2, 2025

Description

Due to a bug, 5xx were not properly retried.

What problem is being solved?

How is it being solved?

What changes are made to solve it?

References

closes #204

Review Checklist

  • I have clicked on "allow edits by maintainers".
  • I have added documentation for new/changed functionality in this PR or in a PR to openfga.dev [Provide a link to any relevant PRs in the references section above]
  • The correct base branch is being used, if not main
  • I have added tests to validate that the change in functionality is working as expected

Summary by CodeRabbit

  • Bug Fixes

    • Fixed retry behavior for server 5xx errors to ensure requests are retried and succeed when the server recovers.
  • Tests

    • Added tests validating retry logic for 5xx responses.
    • Added tests ensuring 429 responses with Retry-After are respected.
  • Documentation

    • Updated changelog to note the fix for improper retries on 5xx errors.

@rhamzeh rhamzeh requested a review from a team as a code owner October 2, 2025 17:26
Copy link

coderabbitai bot commented Oct 2, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Updates error retry logic to reference ShouldRetry(), adds tests for 500 and 429 Retry-After retry behavior using a local server, and documents the fix in the changelog.

Changes

Cohort / File(s) Summary
Docs / Changelog
CHANGELOG.md
Adds Unreleased entry noting fix: 5xx errors were not being properly retried.
Error handling logic
errors.go
Removes private shouldRetry field from FgaApiInternalError; GetTimeToWait now calls ShouldRetry() to determine retry behavior.
Tests for retry behavior
api_open_fga_test.go
Adds subtests validating retries on 500 and on 429 with Retry-After; uses httptest server, atomic counters, and timing assertions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client as SDK Client
  participant Server as OpenFGA Server

  rect rgba(230,245,255,0.5)
    Note over Client,Server: Request with retry params
    Client->>Server: API call
    Server-->>Client: 500 / 429 (+ Retry-After?)
    alt ShouldRetry() == true
      Note over Client: Compute wait (Retry-After or backoff)
      Client-->>Client: Wait
      Client->>Server: Retry
      Server-->>Client: 200 OK (eventual)
    else ShouldRetry() == false or retries exhausted
      Client-->>Client: Return error
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning The PR adds a new subtest covering 429 responses with a Retry-After header, which is not part of issue #204’s goal to address retry behavior only for 5xx server errors. Consider removing the 429 Retry-After subtest or moving it to a separate PR, or else update the PR description to explicitly state that retry coverage is being expanded beyond 5xx errors.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the primary change of ensuring that 5xx internal server errors are retried, directly reflecting the modifications in errors.go and the accompanying tests without including extraneous details.
Linked Issues Check ✅ Passed The update replaces the unused private retry flag with the exported ShouldRetry logic for 5xx errors and adds tests that verify configured retries occur for 5xx responses, fully satisfying the requirements of issue #204.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.29%. Comparing base (d2ed21c) to head (3d1d17f).

❌ Your project status has failed because the head coverage (28.29%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #241      +/-   ##
==========================================
+ Coverage   28.16%   28.29%   +0.12%     
==========================================
  Files         111      111              
  Lines       12000    12000              
==========================================
+ Hits         3380     3395      +15     
+ Misses       8337     8325      -12     
+ Partials      283      280       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rhamzeh rhamzeh requested review from Copilot and jimmyjames October 2, 2025 17:26
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where 5xx internal server errors were not being properly retried by removing a redundant shouldRetry field and replacing its usage with the existing ShouldRetry() method.

  • Removed unused shouldRetry field from FgaApiInternalError struct
  • Fixed retry logic to use the ShouldRetry() method instead of the removed field
  • Added comprehensive tests to verify retry behavior for both 500 errors and 429 rate limit errors

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
errors.go Removed redundant shouldRetry field and fixed retry logic to use ShouldRetry() method
api_open_fga_test.go Added comprehensive tests for retry behavior on 500 and 429 errors
CHANGELOG.md Added changelog entry documenting the fix

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
errors.go (1)

634-638: Nit: use the shared Retry-After header constant for consistency

Internal error parsing uses a string literal; rate-limit path uses retryutils.RetryAfterHeaderName. Consider aligning here for readability and single source of truth.

Example:

retryAfter := retryutils.ParseRetryAfterHeaderValue(httpResponse.Header, retryutils.RetryAfterHeaderName)
api_open_fga_test.go (2)

1526-1577: Solid 500-retry test; consider adding a 501 “not retried” case

This validates retrial on transient 500s and final success with expected attempt count. Consider adding a sibling subtest asserting that 501 (Not Implemented) is not retried (attempts == 1), documenting the exclusion in ShouldRetry().

Example sketch:

t.Run("Do not retry on 501 error", func(t *testing.T) {
  var attempts int32
  server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    atomic.AddInt32(&attempts, 1)
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusNotImplemented) // 501
    _, _ = w.Write([]byte(`{"code":"not_implemented","message":"no retry"}`))
  }))
  defer server.Close()

  cfg, _ := NewConfiguration(Configuration{ ApiUrl: server.URL, RetryParams: &RetryParams{MaxRetry: 4, MinWaitInMs: 10}, HTTPClient: &http.Client{} })
  apiClient := NewAPIClient(cfg)

  _, _, err := apiClient.OpenFgaApi.Check(context.Background(), "store").Body(CheckRequest{TupleKey: CheckRequestTupleKey{User:"u",Relation:"r",Object:"o"}}).Execute()
  if err == nil { t.Fatalf("expected error") }
  if got := int(atomic.LoadInt32(&attempts)); got != 1 { t.Fatalf("expected 1 attempt, got %d", got) }
})

1579-1637: Great 429/Retry-After coverage; add small tolerance to avoid flakiness

Asserting elapsed >= 2s is correct. To reduce CI flakiness, allow a tiny margin (e.g., >= 2s - 50ms) or make the per-attempt header 2s to widen the window. Also optional: compute expected duration without floats for readability.

  • Tolerance: minExpected := 2*time.Second - 50*time.Millisecond
  • Readability: minExpected := 2 * time.Second (drop retryAfterSeconds float)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2ed21c and ef607f0.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • api_open_fga_test.go (2 hunks)
  • errors.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
api_open_fga_test.go (5)
configuration.go (3)
  • NewConfiguration (55-102)
  • Configuration (33-48)
  • RetryParams (30-30)
internal/utils/retryutils/retryparams.go (1)
  • RetryParams (18-21)
api_client.go (1)
  • NewAPIClient (68-99)
model_check_request.go (1)
  • CheckRequest (22-31)
model_check_request_tuple_key.go (1)
  • CheckRequestTupleKey (22-26)
🔇 Additional comments (3)
errors.go (1)

599-603: Good fix: gate wait on ShouldRetry()

This correctly enables retries for retriable 5xx (excluding 501). Matches PR goal.

CHANGELOG.md (1)

6-6: Changelog entry reads well

Accurately captures the fix for 5xx retry behavior.

api_open_fga_test.go (1)

20-22: Imports OK

httptest and sync/atomic are appropriate for the new retry tests.

rhamzeh and others added 2 commits October 2, 2025 13:33
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@rhamzeh rhamzeh added this pull request to the merge queue Oct 2, 2025
Merged via the queue into main with commit 4367c98 Oct 2, 2025
14 checks passed
@rhamzeh rhamzeh deleted the fix/retry-on-5xx branch October 2, 2025 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5xx errors don't seem to be retried by the SDK

3 participants