perf(e2e): pool local workers across classes and cluster tests by config by MohanKumar21 · Pull Request #1487 · lightseekorg/smg

MohanKumar21 · 2026-05-13T11:33:58Z

Description

Problem

GPU E2E jobs (especially e2e_test/responses on 2×H100) were hitting the 25-minute pytest step timeout even though tests passed. Most wall time went to repeated cold starts of SGLang/vLLM/TRT-LLM workers (~1–3 minutes each) because setup_backend was class-scoped and tore down workers after every test class. Alternating models (e.g. Qwen vs gpt-oss) made reuse impossible without reordering and pooling.

Solution

Session-scoped worker pooling for non-PD local backends: workers are keyed by (model_id, engine, connection_mode, count) and reused across classes; each class still gets a fresh gateway for isolated router state. LRU eviction (default pool size 1) frees GPUs before starting a new worker set; the pool is drained before PD setups to avoid GPU overlap with prefill/decode. Stable collection sort clusters tests by the same key so consecutive classes are more likely to hit the pool. pytest_sessionfinish stops any remaining pooled workers at session end.

Changes

e2e_test/fixtures/setup_backend.py: worker pool (_WorkerKey, _pool_get / _pool_put / _pool_make_room_for / _pool_drop), shutdown_worker_pool(), _setup_local reuses workers and always tears down the gateway; gateway failure paths discard or evict suspect workers; PD path drains the pool before worker launch.
e2e_test/fixtures/hooks.py: _backend_sort_key, extend pytest_collection_modifyitems with stable sort after env filtering; pytest_sessionfinish calls shutdown_worker_pool().
e2e_test/fixtures/__init__.py, e2e_test/conftest.py: export/import pytest_sessionfinish; docstring updates for setup_backend.
Env toggles: E2E_DISABLE_WORKER_POOL=1 (legacy per-class behavior), E2E_WORKER_POOL_SIZE (default 1), E2E_DISABLE_TEST_SORT=1 (skip clustering sort).

Test Plan

Compare log lines Starting … grpc worker vs Reusing pooled workers for. After the change there should be far fewer cold starts when multiple classes share the same model/mode/worker count.

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

Release Notes

Chores
- Improved test infrastructure with enhanced resource pooling mechanisms and better session cleanup to optimize test efficiency.
Tests
- Refined test collection and ordering logic to better organize backend test scenarios and manage resource utilization.

coderabbitai · 2026-05-13T11:34:12Z

Warning

Rate limit exceeded

@MohanKumar21 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 36 minutes and 53 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 27d47812-466a-46e8-b644-4e1721cc8de9

📥 Commits

Reviewing files that changed from the base of the PR and between f4f0a6f and a56c558.

📒 Files selected for processing (4)

e2e_test/conftest.py
e2e_test/fixtures/__init__.py
e2e_test/fixtures/hooks.py
e2e_test/fixtures/setup_backend.py

📝 Walkthrough

Walkthrough

The PR introduces session-scoped worker pooling to reuse warmed test workers across test classes sharing the same configuration, with LRU eviction and session-end cleanup. It also adds test ordering by backend config to group compatible tests together, improving resource utilization and test efficiency.

Changes

Session-scoped Worker Pooling and Test Optimization

Layer / File(s)	Summary
Worker pool infrastructure and shutdown `e2e_test/fixtures/setup_backend.py`, `e2e_test/fixtures/hooks.py`	`setup_backend.py` implements `_WorkerKey`, an `OrderedDict` LRU cache, and `shutdown_worker_pool()` to manage session-scoped pooled workers; `hooks.py` imports the shutdown function and registers `pytest_sessionfinish` to flush the pool at session end.
Hook discovery and fixture re-exports `e2e_test/conftest.py`, `e2e_test/fixtures/__init__.py`	Conftest and fixtures package import and re-export `pytest_sessionfinish` via `__all__` for pytest discovery; setup_backend fixture docstring documents pooling behavior including the `(model_id, engine, mode, count)` key and `E2E_DISABLE_WORKER_POOL` opt-out.
Local backend pool integration `e2e_test/fixtures/setup_backend.py`	`_setup_local` derives a pool key from worker configuration, reuses cached workers on hits, evicts LRU entries to free GPU capacity before launching fresh workers, and conditionally returns workers to the pool on gateway success or discards them on failure.
PD backend pool drain `e2e_test/fixtures/setup_backend.py`	Before launching PD prefill/decode workers, the fixture drains any cached regular workers to prevent GPU collisions.
Test ordering by backend configuration `e2e_test/fixtures/hooks.py`	New `_backend_sort_key()` computes a stable sort order from test markers (model, workers) and backend parameters; `pytest_collection_modifyitems` conditionally filters tests by environment variables and sorts by backend config unless `E2E_DISABLE_TEST_SORT` is set.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

lightseekorg/smg#556: Both PRs modify test collection and ordering logic in e2e_test/fixtures/hooks.py via new reordering/sort mechanisms.
lightseekorg/smg#587: Both PRs adjust worker lifecycle and teardown logic in e2e_test/fixtures/setup_backend.py.
lightseekorg/smg#525: Both PRs modify worker lifecycle and selection in e2e_test/fixtures/setup_backend.py for multi-worker configuration handling.

Suggested labels

tests

Suggested reviewers

key4ng
slin1237
XinyueZhang369

Poem

🐰 The pool fills with workers, toasty and warm,
Shared across tests in a reused form,
Session-long caching, GPU rooms freed,
Tests clustering close by the config they need!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'perf(e2e): pool local workers across classes and cluster tests by config' accurately reflects the main objectives: introducing worker pooling and test clustering for performance optimization in E2E tests.
Docstring Coverage	✅ Passed	Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4f0a6f509

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T11:36:45Z

+    while len(_worker_pool) >= _POOL_MAX:
+        evict_key = next(iter(_worker_pool))


Evict conflicting pooled workers before cold-starting

When E2E_WORKER_POOL_SIZE is set above 1, this miss path can keep an existing pooled entry alive while starting a new worker set, because eviction only runs when len(_worker_pool) >= _POOL_MAX. Regular workers are launched from GPU offset 0 by default (start_workers(..., gpu_offset=0)), so two different pooled configs can be started concurrently on overlapping GPU IDs and fail intermittently with resource collisions. This makes the advertised multi-entry pool setting unsafe for runs that try to keep more than one config warm.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e_test/fixtures/hooks.py`:
- Around line 133-139: The sort key currently uses per-item callspec.params
(callspec, params, backend) which fragments class-level mixed parametrizations;
instead, derive the backend string from aggregated parametrization markers on
the test class/collection: inspect item.cls or its pytestmark markers to collect
parametrize markers that define "setup_backend" or "backend_router", coalesce
those values into a single backend component and use that for backend (falling
back to the existing per-item params only if no class-level markers are found).
Update the backend construction logic around the backend variable to read
aggregated markers from item.cls/pytestmark rather than relying solely on
callspec.params so class-level parametrizations are grouped together for
clustering.

In `@e2e_test/fixtures/setup_backend.py`:
- Around line 89-90: The current module-level cast for E2E_WORKER_POOL_SIZE can
raise on import if the env var is malformed; change the parsing that sets
_POOL_MAX to be defensive: read os.environ.get("E2E_WORKER_POOL_SIZE", "1") into
a local var, attempt to int() it inside a try/except ValueError (and TypeError
if desired), and on exception fall back to 1 (then apply max(1, parsed)). Update
the assignment of _POOL_MAX accordingly (referencing the _POOL_MAX symbol and
the E2E_WORKER_POOL_SIZE environment variable) so import-time failures are
avoided; optionally emit a warning/log when falling back.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: abb8f1cb-5c76-47c7-8950-74923be8f3e5

📥 Commits

Reviewing files that changed from the base of the PR and between c5d7afd and f4f0a6f.

📒 Files selected for processing (4)

e2e_test/conftest.py
e2e_test/fixtures/__init__.py
e2e_test/fixtures/hooks.py
e2e_test/fixtures/setup_backend.py

gemini-code-assist

Code Review

This pull request introduces a session-scoped worker pool for E2E tests to amortize cold-start costs by reusing workers across test classes with identical configurations. It includes a stable sorting mechanism for test collection and logic to manage worker lifecycle, including eviction and session-end cleanup. I have reviewed the implementation and provided a suggestion to improve the robustness of the pool size configuration.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 125f99d08f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

MohanKumar21 · 2026-05-13T12:52:45Z

@slin1237 @CatherineSue Small PR for optimizing the GPU Workflows

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>

MohanKumar21 requested review from CatherineSue, XinyueZhang369, key4ng and slin1237 as code owners May 13, 2026 11:33

github-actions Bot added the tests Test changes label May 13, 2026

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

coderabbitai Bot requested changes May 13, 2026

View reviewed changes

Comment thread e2e_test/fixtures/hooks.py Outdated

Comment thread e2e_test/fixtures/setup_backend.py Outdated

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

Comment thread e2e_test/fixtures/setup_backend.py Outdated

MohanKumar21 force-pushed the ksmkumar/fix-e2e-gpu-workflow branch from f4f0a6f to 125f99d Compare May 13, 2026 11:45

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Comment thread e2e_test/fixtures/hooks.py

MohanKumar21 force-pushed the ksmkumar/fix-e2e-gpu-workflow branch from 125f99d to 0c368ac Compare May 13, 2026 11:54

coderabbitai Bot approved these changes May 13, 2026

View reviewed changes

perf(e2e): pool local workers across classes and cluster tests by config

a56c558

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>

MohanKumar21 force-pushed the ksmkumar/fix-e2e-gpu-workflow branch from 0c368ac to a56c558 Compare May 14, 2026 07:44

		while len(_worker_pool) >= _POOL_MAX:
		evict_key = next(iter(_worker_pool))

Conversation

MohanKumar21 commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

MohanKumar21 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MohanKumar21 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading