Skip to content

Conversation

@dongbo910220
Copy link
Contributor

@dongbo910220 dongbo910220 commented Oct 20, 2025

Part of #26900

This PR continues the effort to clean up vllm.utils by extracting NCCL-related helper functions into a new, dedicated module vllm/utils/nccl.py.

The following functions have been moved:

  • vllm.utils.find_nccl_library -> vllm.utils.nccl.find_nccl_library
  • vllm.utils.find_nccl_include_paths -> vllm.utils.nccl.find_nccl_include_paths

CC: @DarkLight1337

Test Plan

Test Result

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively separates the NCCL utility functions from vllm.utils into a new vllm.utils.nccl module, which improves code organization and modularity. The changes are clean and follow the refactoring goal. I have one suggestion to improve error handling in the new nccl.py module.

Comment on lines 80 to 56
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a broad except Exception: pass can hide important errors. If an unexpected issue occurs while searching for the nvidia.nccl package's include paths, it will be silently ignored. This could lead to downstream compilation failures that are difficult to debug. It's better to at least log the exception to aid in troubleshooting.

Suggested change
except Exception:
pass
except Exception as e:
logger.debug("Failed to find nccl include path from nvidia.nccl package: %s", e)

@dongbo910220 dongbo910220 force-pushed the feature/refactor-nccl-utils branch from aad29b5 to d6273aa Compare October 20, 2025 14:19
Move NCCL library discovery functions to vllm/utils/nccl.py:
- find_nccl_library: NCCL/RCCL library resolver
- find_nccl_include_paths: NCCL header path resolver

This reduces vllm/utils/__init__.py complexity and improves
modularity for distributed computing utilities.

Contributes to vllm-project#26900

Signed-off-by: dongbo910220 <1275604947@qq.com>
@dongbo910220 dongbo910220 force-pushed the feature/refactor-nccl-utils branch from d6273aa to 14ee2bf Compare October 20, 2025 14:29
…ils.nccl

- Replace legacy imports in pynccl_* files with vllm.utils.nccl.
- Drop re-exports from vllm.utils.__init__ (no back-compat).

Signed-off-by: dongbo910220 <1275604947@qq.com>
@dongbo910220 dongbo910220 force-pushed the feature/refactor-nccl-utils branch from 57b1314 to 97f234a Compare October 20, 2025 20:34
@DarkLight1337
Copy link
Member

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors NCCL-related utility functions out of vllm.utils into a new, dedicated module, vllm.utils.nccl. This improves modularity and code organization. The changes are well-executed: imports are correctly updated, and the new module includes improved docstrings and more informative logging for previously silent errors. The overall code quality is excellent, and I have no high or critical severity concerns with this change.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini didn't flag any missing imports so LGTM, see if the tests pass now

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 21, 2025 01:12
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025
@dongbo910220
Copy link
Contributor Author

Hi @DarkLight1337 , The failing test tests/v1/distributed/test_async_llm_dp.py::test_load[True-ray-RequestOutputKind.DELTA]
is failing due to a validation error in vllm/engine/arg_utils.py:1510: ValueError: Currently, async scheduling only supports mp or uni distributed executor backend, but choose ray.
However, this PR does not touch arg_utils.py at all. This refactor only moves NCCL utility functions:

  • From: vllm/utils/__init__.py
  • To: vllm/utils/nccl.py

The validation logic that rejects async + ray exists in the main branch and is unrelated to
this utils refactoring. So this failure has nothing to do with our changes..

Note: The test with mp backend (test_load[True-mp-...]) passed successfully,
confirming our NCCL refactor works correctly.

@vllm-bot vllm-bot merged commit 6c728f7 into vllm-project:main Oct 21, 2025
51 of 53 checks passed
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Oct 21, 2025
usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Chenyaaang pushed a commit to Chenyaaang/vllm that referenced this pull request Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants