sglang disagg: opt-in RDMA device restriction + benchmark harness fixes & metrics display#200
Open
atnair-amd wants to merge 3 commits into
Open
sglang disagg: opt-in RDMA device restriction + benchmark harness fixes & metrics display#200atnair-amd wants to merge 3 commits into
atnair-amd wants to merge 3 commits into
Conversation
…ured HCAs only) A privileged container exposes every host RDMA device under /dev/infiniband regardless of --device, so NCCL_IB_HCA only limits usage, not discovery. Add an opt-in path that launches the inference container unprivileged with only the configured HCAs' /dev/infiniband/uverbsN nodes exposed, so ibv_devinfo inside the container is restricted to that set. - docker_lib.launch_docker_container: add backward-compatible privileged=True and extra_run_args='' kwargs; defaults preserve prior behavior for all suites. - linux_utils.get_uverbs_devices_for_hcas: resolve ibdev names to per-node /dev/infiniband/uverbsN (+ rdma_cm) via /sys/class/infiniband_verbs. - sglang_llama_70b_distributed: when config restrict_rdma_devices is set, resolve the configured HCAs per node and launch unprivileged with an explicit device list. Signed-off-by: Atul Nair <Atul.Nair@amd.com>
Benchmark harness fixes (gsm8k + bench_serv), independent of the RDMA work: - target proxy_router_node instead of 0.0.0.0 (the benchmark client runs on the benchmark node while the router runs on another node, so localhost refused). - bench_serv: PYTHONPATH=/sgl-workspace/sglang/python so sglang.bench_serving imports on images whose editable-install finder is stale. - bench_serv: optional --dataset-path (config bench_dataset_path) for a pre-staged corpus under HF_HUB_OFFLINE; optional --max-concurrency (config max_concurrency) so it does not flood the deployment. - exec_nic_setup_scripts (thor2/broadcom): run ibv_devinfo after copying the bnxt_re driver and verify devices enumerate, instead of matching a fixed bnxt_ name prefix (HCAs may enumerate as rocepXXs0). Structured metrics display: - parse_bench_serv_metrics parses the full Serving Benchmark Result block, fixing the unescaped-paren guards (median/p99 TTFT/TPOT) and the E2EL vs E2E Latency mismatch and adding previously-unparsed fields. - gsm8k parses accuracy/invalid/latency/tokens_per_sec into the results dict. - log a uniform per-node metrics table with per-threshold PASS/FAIL verdicts. - capture per-item detail (gsm8k per-question via --raw-result-file, bench_serv per-request via --output-file/--output-details) and log a compact table. - docs/specs/inference_metrics_display.md documents the design. Signed-off-by: Atul Nair <Atul.Nair@amd.com>
Drop docs/specs/inference_metrics_display.md; the metrics parsing/display behavior is described in the code and the PR description. Signed-off-by: Atul Nair <Atul.Nair@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three related changes to the SGLang disaggregated-PD inference suite, developed and validated together on a 2-node MI300 + Thor2 RoCE setup.
1. Opt-in RDMA device restriction (
restrict_rdma_devices)A
--privilegedcontainer exposes every host RDMA device under/dev/infinibandregardless of--device, soNCCL_IB_HCAonly limits usage, not discovery. New opt-in path launches the inference container unprivileged with only the configured HCAs'/dev/infiniband/uverbsNnodes exposed, soibv_devinfoinside the container is restricted to that set.docker_lib.launch_docker_container: backward-compatibleprivileged=True/extra_run_args=''kwargs (defaults unchanged for all other suites).linux_utils.get_uverbs_devices_for_hcas: resolve ibdev names to per-node/dev/infiniband/uverbsN(+rdma_cm) via/sys/class/infiniband_verbs.sglang_llama_70b_distributed: whenrestrict_rdma_devicesis set, resolve the configured HCAs per node and launch unprivileged.2. Benchmark harness fixes (pre-existing bugs, unrelated to the workload)
0.0.0.0(localhost on the benchmark node) instead of the router node -> connection refused; now targetproxy_router_node.PYTHONPATH=/sgl-workspace/sglang/pythonsosglang.bench_servingimports on images whose editable-install finder is stale.--dataset-path(configbench_dataset_path) for a pre-staged corpus underHF_HUB_OFFLINE; optional--max-concurrency(configmax_concurrency) so it doesn't flood the deployment.exec_nic_setup_scripts(thor2/broadcom): runibv_devinfoafter copying the bnxt_re driver and verify devices enumerate, instead of matching a fixedbnxt_name (HCAs may enumerate asrocepXXs0).3. Structured metrics display
parse_bench_serv_metricsparses the fullServing Benchmark Resultblock, fixing the unescaped-paren guards (median/p99 TTFT/TPOT) and theE2ELvsE2E Latencymismatch, and adding previously-unparsed fields (~28 total).--raw-result-file) and bench_serv per-request (--output-file/--output-details), logged as a compact table.Out of scope
Test plan
make fmt-check && make lint && make test-- all pass.sglang_llama_70b_distributedsuite, 2-node disagg (1 prefill/router + 1 decode/bench), unprivileged restricted-RDMA containers -- green (10/10).ibv_devinfoinside the container shows exactly the 8 configured HCAs (4 excluded); gsm8k 1017 tok/s (941/1000 correct); bench_serv 300/300 successful, 883 tok/s, with full per-item tables.