refactor allgather/mc2-related fused_experts #2369

momo609 · 2025-08-14T06:36:15Z

What this PR does / why we need it?

refactor allgather/mc2-related fused_experts

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@de7b67a

gemini-code-assist

Code Review

This pull request refactors the Mixture-of-Experts (MoE) token dispatching logic by introducing a base class MoETokenDispatcher and several implementations for different strategies. It also adds a new test suite for the UnquantizedTokenDispatcherWithMC2 class. While the refactoring improves structure, I've identified several critical issues in both the new implementation and the tests that must be addressed. The tests contain incorrect mock paths and will fail due to a KeyError. The implementation has critical bugs such as overwriting initialized variables with None, using undefined attributes, and incorrect usage of super(). These issues impact correctness and maintainability.

gemini-code-assist · 2025-08-14T06:37:25Z

tests/ut/ops/test_token_dispatcher.py

+        self.need_param = {}  # Replace with actual parameters if needed
+        self.dispatcher = UnquantizedTokenDispatcherWithMC2(need_param=self.need_param)


The need_param dictionary is initialized as empty. However, the UnquantizedTokenDispatcherWithMC2 class (via its parent MoETokenDispatcher) expects top_k and num_experts keys to be present in this dictionary during initialization. This will lead to a KeyError when running the tests. Please provide the necessary parameters.

Suggested change

self.need_param = {} # Replace with actual parameters if needed

self.dispatcher = UnquantizedTokenDispatcherWithMC2(need_param=self.need_param)

self.need_param = {"top_k": 2, "num_experts": 8} # Example values

self.dispatcher = UnquantizedTokenDispatcherWithMC2(need_param=self.need_param)

gemini-code-assist · 2025-08-14T06:37:25Z

tests/ut/ops/test_token_dispatcher.py

+        self.patcher_mc2_group = mock.patch('your_module.get_mc2_group', mock_get_mc2_group)
+        self.patcher_mc2_group.start()
+
+        # Mock ascend config
+        mock_ascend_config = mock.Mock()
+        mock_ascend_config.torchair_graph_config.enabled = False
+        self.patcher_ascend_config = mock.patch('your_module.get_ascend_config', return_value=mock_ascend_config)
+        self.patcher_ascend_config.start()
+
+        # Mock ascend soc version
+        self.patcher_ascend_version = mock.patch('your_module.get_ascend_soc_version', return_value=AscendSocVersion.A3)
+        self.patcher_ascend_version.start()
+
+        # Mock forward context
+        mock_forward_context = mock.Mock()
+        mock_forward_context.mc2_mask = torch.tensor([1, 0, 1])  # Example mask
+        self.patcher_forward_context = mock.patch('your_module.get_forward_context', return_value=mock_forward_context)


The mock paths for get_mc2_group, get_ascend_config, get_ascend_soc_version, and get_forward_context are incorrect, using the placeholder 'your_module'. The patch target must be the location where the name is looked up. In this case, these functions are imported and used within the vllm_ascend.ops.moe_dispatcher.token_dispatcher module. The tests will fail to patch the correct objects and will likely fail. Please correct these paths. For example, mock.patch('your_module.get_mc2_group', ...) should be mock.patch('vllm_ascend.ops.moe_dispatcher.token_dispatcher.get_mc2_group', ...). A similar issue exists in test_a3_extra_args_handling.

gemini-code-assist · 2025-08-14T06:37:25Z

vllm_ascend/ops/moe_dispatcher/token_dispatcher.py

+        self.ep_rank_id = None
+        self.ep_world_size = None


The attributes ep_rank_id and ep_world_size are re-assigned to None immediately after being initialized with values from device_group. This will cause errors when these attributes are used later in the token_permutation method. Please remove these redundant assignments.

gemini-code-assist · 2025-08-14T06:37:25Z

vllm_ascend/ops/moe_dispatcher/token_dispatcher.py

+        self.ep_rank_id = None
+        self.ep_world_size = None


The attributes ep_rank_id and ep_world_size are re-assigned to None immediately after being initialized with values from device_group. This will cause errors when these attributes are used later. Please remove these redundant assignments.

gemini-code-assist · 2025-08-14T06:37:25Z

vllm_ascend/ops/moe_dispatcher/token_dispatcher.py

+        self.bsz = None
+        device_group = get_ep_group().device_group
+        self.ep_size = device_group.world_size
+        self.local_num_experts = self.global_num_experts // self.ep_size


The attribute self.global_num_experts is used here but it is not defined in this class or its parent MoETokenDispatcher. This will raise an AttributeError. It seems you intended to use self.num_experts, which is available from the parent class.

Suggested change

self.local_num_experts = self.global_num_experts // self.ep_size

self.local_num_experts = self.num_experts // self.ep_size

gemini-code-assist · 2025-08-14T06:37:25Z

vllm_ascend/ops/moe_dispatcher/token_dispatcher.py

+
+class UnquantizedTokenDispatcherWithMC2(MoETokenDispatcher):
+    def __init__(self, need_param):
+        super(MoETokenDispatcher, self).__init__(need_param=need_param)


The super() call super(MoETokenDispatcher, self) is incorrect. It should be super().__init__(need_param=need_param) for modern Python, or super(UnquantizedTokenDispatcherWithMC2, self).__init__(need_param=need_param). Using the base class in super() is not standard and can cause method resolution issues. This pattern is repeated in other dispatcher classes in this file (QuantizedTokenDispatcherWithMC2, QuantizedTokenDispatcherWithAllGather, UnquantizedTokenDispatcherWithFusedExpertsMoge).

Suggested change

super(MoETokenDispatcher, self).__init__(need_param=need_param)

super().__init__(need_param=need_param)

github-actions · 2025-08-14T06:58:11Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2025-08-20T02:20:45Z

you can ignore other ut failure. But please make sure the test related to this PR pass:

FAILED tests/ut/ops/test_token_dispatcher.py::TestTokenDispatcherWithMC2::test_get_unpermute_mc_kwargs_with_quant - AssertionError
FAILED tests/ut/ops/test_token_dispatcher.py::TestTokenDispatcherWithMC2::test_token_unpermutation_with_shared_experts - AssertionError
FAILED tests/ut/ops/test_token_dispatcher.py::TestTokenDispatcherWithAllGather::test_token_permutation_with_expert_map - IndexError: index 7 is out of bounds for dimension 0 with size 6
FAILED tests/ut/ops/test_token_dispatcher.py::TestTokenDispatcherWithAllGather::test_token_unpermutation_without_expert_map - AssertionError

wangxiyuan · 2025-08-20T02:17:26Z

tests/ut/ops/test_token_dispatcher.py

+class TestTokenDispatcherWithMC2(unittest.TestCase):
+
+    def setUp(self):
+        # Mock get_mc2_group() 返回固定值


remove chinese in this file

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

github-actions bot added module:tests module:ops labels Aug 14, 2025

momo609 force-pushed the refactor branch 5 times, most recently from 6aff3e9 to 891282f Compare August 20, 2025 01:08

wangxiyuan reviewed Aug 20, 2025

View reviewed changes

refactor mc2/allgather tokendispatch.

1870c8e

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>

momo609 force-pushed the refactor branch from 3f5eb78 to 1870c8e Compare August 20, 2025 03:45

wangxiyuan approved these changes Aug 20, 2025

View reviewed changes

wangxiyuan merged commit 3f867ee into vllm-project:main Aug 20, 2025
19 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor allgather/mc2-related fused_experts #2369

refactor allgather/mc2-related fused_experts #2369

momo609 commented Aug 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

wangxiyuan commented Aug 20, 2025

Uh oh!

wangxiyuan Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

		self.need_param = {} # Replace with actual parameters if needed
		self.dispatcher = UnquantizedTokenDispatcherWithMC2(need_param=self.need_param)

	self.local_num_experts = self.global_num_experts // self.ep_size
	self.local_num_experts = self.num_experts // self.ep_size

	super(MoETokenDispatcher, self).__init__(need_param=need_param)
	super().__init__(need_param=need_param)

refactor allgather/mc2-related fused_experts #2369

refactor allgather/mc2-related fused_experts #2369

Conversation

momo609 commented Aug 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

wangxiyuan commented Aug 20, 2025

Uh oh!

wangxiyuan Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

momo609 commented Aug 14, 2025 •

edited by github-actions bot

Loading