Skip to content

fix(tests): make RPM limit test sequential to fix race condition#21937

Merged
jquinter merged 1 commit intoBerriAI:mainfrom
jquinter:fix/flaky-rpm-limit-test
Feb 23, 2026
Merged

fix(tests): make RPM limit test sequential to fix race condition#21937
jquinter merged 1 commit intoBerriAI:mainfrom
jquinter:fix/flaky-rpm-limit-test

Conversation

@jquinter
Copy link
Collaborator

@jquinter jquinter commented Feb 23, 2026

Summary

  • Fixed flaky test_pass_through_endpoint_rpm_limit test that intermittently fails due to a race condition
  • Requests were fired concurrently via run_in_executor + asyncio.gather, allowing more requests to slip through the rate limiter before the counter was updated (e.g. 3 successes instead of expected 2 with rpm_limit=2)
  • Changed to sequential requests so rate limiting behavior is deterministic

Test plan

  • Verified auth=False test cases pass locally
  • auth=True cases require LITELLM_LICENSE (CI only)

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Feb 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 23, 2026 7:36pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Greptile Summary

This PR fixes a flaky test (test_pass_through_endpoint_rpm_limit) by replacing concurrent request execution with sequential requests. The original test used run_in_executor + asyncio.gather to fire requests concurrently, which allowed multiple requests to slip through the rate limiter before the counter was updated — causing non-deterministic results (e.g., 3 successes instead of the expected 2 with rpm_limit=2).

  • Replaced concurrent asyncio.gather pattern with a simple sequential loop of client.post calls in test_pass_through_endpoint_rpm_limit
  • The separate test_pass_through_endpoint_sequential_rpm_limit function (which tests multi-user concurrent behavior) is left unchanged
  • Good explanatory comments added at the change site
  • The PR description includes clear verification that non-auth test cases pass locally

Confidence Score: 5/5

  • This PR is safe to merge — it only modifies test code to fix a race condition, with no production code changes.
  • The change is minimal and well-scoped: it replaces a concurrent request pattern with sequential calls in a single test function. The fix directly addresses the documented flakiness. No production code is affected, no new dependencies are added, and the existing test parameterization and assertions are preserved.
  • No files require special attention.

Important Files Changed

Filename Overview
tests/local_testing/test_pass_through_endpoints.py Replaced concurrent run_in_executor + asyncio.gather with sequential synchronous client.post calls to eliminate race condition in rate limiter test. No functional issues introduced.

Sequence Diagram

sequenceDiagram
    participant Test as Test Runner
    participant Client as TestClient
    participant Proxy as Proxy Server
    participant RL as Rate Limiter

    Note over Test,RL: Before (Concurrent - Race Condition)
    Test->>Client: run_in_executor(post /v1/rerank) x N
    Client-->>Proxy: Concurrent requests
    Proxy->>RL: Check RPM (race: counter not yet updated)
    RL-->>Proxy: Allow (stale count)
    Note over RL: More requests slip through than expected

    Note over Test,RL: After (Sequential - Deterministic)
    loop For each request
        Test->>Client: client.post(/v1/rerank)
        Client->>Proxy: Single request
        Proxy->>RL: Check RPM
        RL-->>Proxy: Allow or Reject (counter updated)
        Proxy-->>Client: 200 or 429
        Client-->>Test: Response
    end
    Note over RL: Rate limiting behaves deterministically
Loading

Last reviewed commit: 6794e62

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Concurrent requests via run_in_executor + asyncio.gather caused a race
condition where more requests slipped through the rate limiter than
expected, leading to flaky test failures (e.g. 3 successes instead of 2
with rpm_limit=2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jquinter jquinter force-pushed the fix/flaky-rpm-limit-test branch from e0aa384 to bb63de2 Compare February 23, 2026 19:35
@jquinter jquinter merged commit 6574d8d into BerriAI:main Feb 23, 2026
30 checks passed
damhau pushed a commit to damhau/litellm that referenced this pull request Feb 26, 2026
fix(tests): make RPM limit test sequential to fix race condition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant