Skip to content

Conversation

@kip-cxj
Copy link

@kip-cxj kip-cxj commented Oct 21, 2025

Modification overview

Add support for the broadcast model on Huawei Ascend NPU, with P2P mode currently under adaptation.

environment

Software version
npu-driver 25.3.rc1
cann 8.3.RC1
python 3.11
torch 2.7.1
torch_npu 2.7.1dev20251016
vllm 0.11.0
vllm-ascend 0.11.0rc0

check list

  • code has been self-tested
    We use this PR testing on Ascend NPU. The device information is 8 Atlas 800T A2. We don’t have GPU, so did not do any related testing.
image

@weixiao-huang
Copy link
Collaborator

Plz fix the pre-commit lint error

@MoonshotAI MoonshotAI deleted a comment from specture724 Oct 22, 2025
device_uuid = current_platform.get_device_uuid(self.device.index)
elif current_platform.device_type == "npu":
device_uuid = (
f"NPU-{current_platform.get_device_name(self.device.index)!s}-{self.device.index}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this uuid unified for each device? Since when used by CUDA, we can set CUDA_DEVICE_DEVICES to override device index so that there may be two same device_uuid in difference processes. I'm not sure whether NPU has this problem

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern. Torch_npu has not implemented get_device_uuid. I think that NPU has not this problem, so we are currently using the global rank id as the uuid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the global rank may differ from inference and ps. Since if a machine has 8 NPU devices, ps will have rank from 0 to 7 so that . But if inference engine use TP1, there may be 8 independent inference engines which may see rank0 in each inference engine and get the same device_uuid. I think this may cause potential bug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to set UUID on Ascend device. Without native API support, I haven't found an ideal approach yet—only two suboptimal solutions:

  1. Use environment variables to obtain the rank ID on the ps, while using torch.dist.get_rank() to get the rank ID in VLLM. Under default configurations, I think these two should be consistent.

  2. Use subprocess to query npu-smi info (which is the NPU equivalent of nvidia-smi for GPUs) and combine the PID to locate the physical ID. This physical ID can then be combined with the server IP to form a UUID. However, this approach incurs significant time overhead and is not concise.

@ZSL98
Copy link

ZSL98 commented Oct 24, 2025

Hi, I can't pass the test_update.py test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants