Skip to content

perf(e2e): pool local workers across classes and cluster tests by config#1487

Open
MohanKumar21 wants to merge 1 commit into
lightseekorg:mainfrom
MohanKumar21:ksmkumar/fix-e2e-gpu-workflow
Open

perf(e2e): pool local workers across classes and cluster tests by config#1487
MohanKumar21 wants to merge 1 commit into
lightseekorg:mainfrom
MohanKumar21:ksmkumar/fix-e2e-gpu-workflow

Conversation

@MohanKumar21
Copy link
Copy Markdown
Contributor

@MohanKumar21 MohanKumar21 commented May 13, 2026

Description

Problem

GPU E2E jobs (especially e2e_test/responses on 2×H100) were hitting the 25-minute pytest step timeout even though tests passed. Most wall time went to repeated cold starts of SGLang/vLLM/TRT-LLM workers (~1–3 minutes each) because setup_backend was class-scoped and tore down workers after every test class. Alternating models (e.g. Qwen vs gpt-oss) made reuse impossible without reordering and pooling.

Solution

Session-scoped worker pooling for non-PD local backends: workers are keyed by (model_id, engine, connection_mode, count) and reused across classes; each class still gets a fresh gateway for isolated router state. LRU eviction (default pool size 1) frees GPUs before starting a new worker set; the pool is drained before PD setups to avoid GPU overlap with prefill/decode. Stable collection sort clusters tests by the same key so consecutive classes are more likely to hit the pool. pytest_sessionfinish stops any remaining pooled workers at session end.

Changes

  • e2e_test/fixtures/setup_backend.py: worker pool (_WorkerKey, _pool_get / _pool_put / _pool_make_room_for / _pool_drop), shutdown_worker_pool(), _setup_local reuses workers and always tears down the gateway; gateway failure paths discard or evict suspect workers; PD path drains the pool before worker launch.
  • e2e_test/fixtures/hooks.py: _backend_sort_key, extend pytest_collection_modifyitems with stable sort after env filtering; pytest_sessionfinish calls shutdown_worker_pool().
  • e2e_test/fixtures/__init__.py, e2e_test/conftest.py: export/import pytest_sessionfinish; docstring updates for setup_backend.
  • Env toggles: E2E_DISABLE_WORKER_POOL=1 (legacy per-class behavior), E2E_WORKER_POOL_SIZE (default 1), E2E_DISABLE_TEST_SORT=1 (skip clustering sort).

Test Plan

  • Compare log lines Starting … grpc worker vs Reusing pooled workers for. After the change there should be far fewer cold starts when multiple classes share the same model/mode/worker count.
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

Release Notes

  • Chores

    • Improved test infrastructure with enhanced resource pooling mechanisms and better session cleanup to optimize test efficiency.
  • Tests

    • Refined test collection and ordering logic to better organize backend test scenarios and manage resource utilization.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@MohanKumar21 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 36 minutes and 53 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 27d47812-466a-46e8-b644-4e1721cc8de9

📥 Commits

Reviewing files that changed from the base of the PR and between f4f0a6f and a56c558.

📒 Files selected for processing (4)
  • e2e_test/conftest.py
  • e2e_test/fixtures/__init__.py
  • e2e_test/fixtures/hooks.py
  • e2e_test/fixtures/setup_backend.py
📝 Walkthrough

Walkthrough

The PR introduces session-scoped worker pooling to reuse warmed test workers across test classes sharing the same configuration, with LRU eviction and session-end cleanup. It also adds test ordering by backend config to group compatible tests together, improving resource utilization and test efficiency.

Changes

Session-scoped Worker Pooling and Test Optimization

Layer / File(s) Summary
Worker pool infrastructure and shutdown
e2e_test/fixtures/setup_backend.py, e2e_test/fixtures/hooks.py
setup_backend.py implements _WorkerKey, an OrderedDict LRU cache, and shutdown_worker_pool() to manage session-scoped pooled workers; hooks.py imports the shutdown function and registers pytest_sessionfinish to flush the pool at session end.
Hook discovery and fixture re-exports
e2e_test/conftest.py, e2e_test/fixtures/__init__.py
Conftest and fixtures package import and re-export pytest_sessionfinish via __all__ for pytest discovery; setup_backend fixture docstring documents pooling behavior including the (model_id, engine, mode, count) key and E2E_DISABLE_WORKER_POOL opt-out.
Local backend pool integration
e2e_test/fixtures/setup_backend.py
_setup_local derives a pool key from worker configuration, reuses cached workers on hits, evicts LRU entries to free GPU capacity before launching fresh workers, and conditionally returns workers to the pool on gateway success or discards them on failure.
PD backend pool drain
e2e_test/fixtures/setup_backend.py
Before launching PD prefill/decode workers, the fixture drains any cached regular workers to prevent GPU collisions.
Test ordering by backend configuration
e2e_test/fixtures/hooks.py
New _backend_sort_key() computes a stable sort order from test markers (model, workers) and backend parameters; pytest_collection_modifyitems conditionally filters tests by environment variables and sorts by backend config unless E2E_DISABLE_TEST_SORT is set.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lightseekorg/smg#556: Both PRs modify test collection and ordering logic in e2e_test/fixtures/hooks.py via new reordering/sort mechanisms.
  • lightseekorg/smg#587: Both PRs adjust worker lifecycle and teardown logic in e2e_test/fixtures/setup_backend.py.
  • lightseekorg/smg#525: Both PRs modify worker lifecycle and selection in e2e_test/fixtures/setup_backend.py for multi-worker configuration handling.

Suggested labels

tests

Suggested reviewers

  • key4ng
  • slin1237
  • XinyueZhang369

Poem

🐰 The pool fills with workers, toasty and warm,
Shared across tests in a reused form,
Session-long caching, GPU rooms freed,
Tests clustering close by the config they need!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf(e2e): pool local workers across classes and cluster tests by config' accurately reflects the main objectives: introducing worker pooling and test clustering for performance optimization in E2E tests.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4f0a6f509

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +132 to +133
while len(_worker_pool) >= _POOL_MAX:
evict_key = next(iter(_worker_pool))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Evict conflicting pooled workers before cold-starting

When E2E_WORKER_POOL_SIZE is set above 1, this miss path can keep an existing pooled entry alive while starting a new worker set, because eviction only runs when len(_worker_pool) >= _POOL_MAX. Regular workers are launched from GPU offset 0 by default (start_workers(..., gpu_offset=0)), so two different pooled configs can be started concurrently on overlapping GPU IDs and fail intermittently with resource collisions. This makes the advertised multi-entry pool setting unsafe for runs that try to keep more than one config warm.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e_test/fixtures/hooks.py`:
- Around line 133-139: The sort key currently uses per-item callspec.params
(callspec, params, backend) which fragments class-level mixed parametrizations;
instead, derive the backend string from aggregated parametrization markers on
the test class/collection: inspect item.cls or its pytestmark markers to collect
parametrize markers that define "setup_backend" or "backend_router", coalesce
those values into a single backend component and use that for backend (falling
back to the existing per-item params only if no class-level markers are found).
Update the backend construction logic around the backend variable to read
aggregated markers from item.cls/pytestmark rather than relying solely on
callspec.params so class-level parametrizations are grouped together for
clustering.

In `@e2e_test/fixtures/setup_backend.py`:
- Around line 89-90: The current module-level cast for E2E_WORKER_POOL_SIZE can
raise on import if the env var is malformed; change the parsing that sets
_POOL_MAX to be defensive: read os.environ.get("E2E_WORKER_POOL_SIZE", "1") into
a local var, attempt to int() it inside a try/except ValueError (and TypeError
if desired), and on exception fall back to 1 (then apply max(1, parsed)). Update
the assignment of _POOL_MAX accordingly (referencing the _POOL_MAX symbol and
the E2E_WORKER_POOL_SIZE environment variable) so import-time failures are
avoided; optionally emit a warning/log when falling back.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: abb8f1cb-5c76-47c7-8950-74923be8f3e5

📥 Commits

Reviewing files that changed from the base of the PR and between c5d7afd and f4f0a6f.

📒 Files selected for processing (4)
  • e2e_test/conftest.py
  • e2e_test/fixtures/__init__.py
  • e2e_test/fixtures/hooks.py
  • e2e_test/fixtures/setup_backend.py

Comment thread e2e_test/fixtures/hooks.py Outdated
Comment thread e2e_test/fixtures/setup_backend.py Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a session-scoped worker pool for E2E tests to amortize cold-start costs by reusing workers across test classes with identical configurations. It includes a stable sorting mechanism for test collection and logic to manage worker lifecycle, including eviction and session-end cleanup. I have reviewed the implementation and provided a suggestion to improve the robustness of the pool size configuration.

Comment thread e2e_test/fixtures/setup_backend.py Outdated
@MohanKumar21 MohanKumar21 force-pushed the ksmkumar/fix-e2e-gpu-workflow branch from f4f0a6f to 125f99d Compare May 13, 2026 11:45
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 125f99d08f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread e2e_test/fixtures/hooks.py
@MohanKumar21 MohanKumar21 force-pushed the ksmkumar/fix-e2e-gpu-workflow branch from 125f99d to 0c368ac Compare May 13, 2026 11:54
@MohanKumar21
Copy link
Copy Markdown
Contributor Author

@slin1237 @CatherineSue Small PR for optimizing the GPU Workflows

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>
@MohanKumar21 MohanKumar21 force-pushed the ksmkumar/fix-e2e-gpu-workflow branch from 0c368ac to a56c558 Compare May 14, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant