perf(e2e): pool local workers across classes and cluster tests by config#1487
perf(e2e): pool local workers across classes and cluster tests by config#1487MohanKumar21 wants to merge 1 commit into
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThe PR introduces session-scoped worker pooling to reuse warmed test workers across test classes sharing the same configuration, with LRU eviction and session-end cleanup. It also adds test ordering by backend config to group compatible tests together, improving resource utilization and test efficiency. ChangesSession-scoped Worker Pooling and Test Optimization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f4f0a6f509
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| while len(_worker_pool) >= _POOL_MAX: | ||
| evict_key = next(iter(_worker_pool)) |
There was a problem hiding this comment.
Evict conflicting pooled workers before cold-starting
When E2E_WORKER_POOL_SIZE is set above 1, this miss path can keep an existing pooled entry alive while starting a new worker set, because eviction only runs when len(_worker_pool) >= _POOL_MAX. Regular workers are launched from GPU offset 0 by default (start_workers(..., gpu_offset=0)), so two different pooled configs can be started concurrently on overlapping GPU IDs and fail intermittently with resource collisions. This makes the advertised multi-entry pool setting unsafe for runs that try to keep more than one config warm.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e_test/fixtures/hooks.py`:
- Around line 133-139: The sort key currently uses per-item callspec.params
(callspec, params, backend) which fragments class-level mixed parametrizations;
instead, derive the backend string from aggregated parametrization markers on
the test class/collection: inspect item.cls or its pytestmark markers to collect
parametrize markers that define "setup_backend" or "backend_router", coalesce
those values into a single backend component and use that for backend (falling
back to the existing per-item params only if no class-level markers are found).
Update the backend construction logic around the backend variable to read
aggregated markers from item.cls/pytestmark rather than relying solely on
callspec.params so class-level parametrizations are grouped together for
clustering.
In `@e2e_test/fixtures/setup_backend.py`:
- Around line 89-90: The current module-level cast for E2E_WORKER_POOL_SIZE can
raise on import if the env var is malformed; change the parsing that sets
_POOL_MAX to be defensive: read os.environ.get("E2E_WORKER_POOL_SIZE", "1") into
a local var, attempt to int() it inside a try/except ValueError (and TypeError
if desired), and on exception fall back to 1 (then apply max(1, parsed)). Update
the assignment of _POOL_MAX accordingly (referencing the _POOL_MAX symbol and
the E2E_WORKER_POOL_SIZE environment variable) so import-time failures are
avoided; optionally emit a warning/log when falling back.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: abb8f1cb-5c76-47c7-8950-74923be8f3e5
📒 Files selected for processing (4)
e2e_test/conftest.pye2e_test/fixtures/__init__.pye2e_test/fixtures/hooks.pye2e_test/fixtures/setup_backend.py
There was a problem hiding this comment.
Code Review
This pull request introduces a session-scoped worker pool for E2E tests to amortize cold-start costs by reusing workers across test classes with identical configurations. It includes a stable sorting mechanism for test collection and logic to manage worker lifecycle, including eviction and session-end cleanup. I have reviewed the implementation and provided a suggestion to improve the robustness of the pool size configuration.
f4f0a6f to
125f99d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 125f99d08f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
125f99d to
0c368ac
Compare
|
@slin1237 @CatherineSue Small PR for optimizing the GPU Workflows |
Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>
0c368ac to
a56c558
Compare
Description
Problem
GPU E2E jobs (especially e2e_test/responses on 2×H100) were hitting the 25-minute pytest step timeout even though tests passed. Most wall time went to repeated cold starts of SGLang/vLLM/TRT-LLM workers (~1–3 minutes each) because setup_backend was class-scoped and tore down workers after every test class. Alternating models (e.g. Qwen vs gpt-oss) made reuse impossible without reordering and pooling.
Solution
Session-scoped worker pooling for non-PD local backends: workers are keyed by (model_id, engine, connection_mode, count) and reused across classes; each class still gets a fresh gateway for isolated router state. LRU eviction (default pool size 1) frees GPUs before starting a new worker set; the pool is drained before PD setups to avoid GPU overlap with prefill/decode. Stable collection sort clusters tests by the same key so consecutive classes are more likely to hit the pool. pytest_sessionfinish stops any remaining pooled workers at session end.
Changes
e2e_test/fixtures/setup_backend.py: worker pool (_WorkerKey,_pool_get/_pool_put/_pool_make_room_for/_pool_drop),shutdown_worker_pool(),_setup_localreuses workers and always tears down the gateway; gateway failure paths discard or evict suspect workers; PD path drains the pool before worker launch.e2e_test/fixtures/hooks.py:_backend_sort_key, extendpytest_collection_modifyitemswith stable sort after env filtering;pytest_sessionfinishcallsshutdown_worker_pool().e2e_test/fixtures/__init__.py,e2e_test/conftest.py: export/importpytest_sessionfinish; docstring updates forsetup_backend.E2E_DISABLE_WORKER_POOL=1(legacy per-class behavior),E2E_WORKER_POOL_SIZE(default1),E2E_DISABLE_TEST_SORT=1(skip clustering sort).Test Plan
Starting … grpc workervsReusing pooled workers for. After the change there should be far fewer cold starts when multiple classes share the same model/mode/worker count.Checklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit
Release Notes
Chores
Tests