[Dist Dialect] Add MoE-related api in PIR dist dialect #66462

pkuzyc · 2024-07-24T07:25:31Z

PR Category

Auto Parallel

PR Types

New features

Description

Pcard-67164

Add corresponding dist_op of the MoE api (#63904) in PIR dist dialect, used for MoE model as following:

local_tensors_from_dtensor: get the tensor list of a dist_tensor on sub-mesh, e.g. for a dist tensor with mesh=[0,1], placements=[Shard(0)], get its sub-mesh list [DistTensor(mesh=[0],placements=[Replicate()]), DistTensor(mesh=[1], placements=[Replicate()])]
dtensor_from_local_tensors: the opposite operation of local_tensors_from_dtensor, get the global-mesh dist tensor from sub-mesh dist tensors, e.g. for a sub-mesh list [DistTensor(mesh=[0],placements=[Replicate()]), DistTensor(mesh=[1], placements=[Replicate()])], get the global mesh dist tensor with mesh=[0,1], placements=[Shard(0)].

paddle-bot · 2024-07-24T07:25:37Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

JZ-LIANG · 2024-07-25T12:09:22Z

paddle/fluid/pir/dialect/distributed/ir/dist_api.h

@@ -40,5 +40,24 @@ pir::Value reshard(
 pir::Value reshard(const pir::Value& x,
                   const TensorDistAttribute& tensor_dist_attr);

+std::vector<pir::Value> local_tensors_from_dist(


这两个 op 是只在IR 表示层还是会给到执行层给到执行器去执行？
dist2dense pass 会删除所有 dist dialect 里表示层的信息，给执行器一个纯 local dense 的program，是否会删除这两个 ops

只在表示层，执行的时候会用 share_data 使用当前local的数据。下一个 PR 里会在 remove_other_rank_op_pass 里面加替换操作。

这个替换逻辑不应该在 remove_other_rank_op_pass 实现，感觉在 reshard 的 pass 里实现更为合理，讲一个表示层的 reshard op 解析替换成实际 collective 操作 ops

dtensor_from_local_tensors 可能也需要 reshard，所以放在remove_other_rank_op_pass里了，也可以试下放到 reshard pass 的最后

zhiqiu

LGTM

xiaoguoguo626807

LGTM for backward

…66462) * add two MoE api in distributed dialect * polish the dist_op and add unit test * remove simple_net_ep unit test * remove redundant print * bug fix, replace platform::errors with phi::errors

pkuzyc requested review from Aurelius84, cxxly, xiaoguoguo626807, changeyoung98 and cyber-pioneer as code owners July 24, 2024 07:25

JZ-LIANG reviewed Jul 25, 2024

View reviewed changes

pkuzyc added 5 commits July 25, 2024 21:48

add two MoE api in distributed dialect

6112dd5

polish the dist_op and add unit test

d641881

remove simple_net_ep unit test

90140a2

remove redundant print

d6c8a90

bug fix, replace platform::errors with phi::errors

3ed11a1

pkuzyc force-pushed the pir_moe branch from 2196a22 to 3ed11a1 Compare July 25, 2024 13:57

pkuzyc mentioned this pull request Jul 29, 2024

[Dist Dialect] Simple MoE training in PIR #66750

Merged

zhiqiu approved these changes Jul 29, 2024

View reviewed changes

Aurelius84 approved these changes Jul 30, 2024

View reviewed changes

xiaoguoguo626807 approved these changes Jul 30, 2024

View reviewed changes

zhiqiu merged commit 8718d78 into PaddlePaddle:develop Jul 30, 2024
30 of 31 checks passed

pkuzyc mentioned this pull request Aug 12, 2024

[Auto Parallel] Rename and refine MoE apis #67332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dist Dialect] Add MoE-related api in PIR dist dialect #66462

[Dist Dialect] Add MoE-related api in PIR dist dialect #66462

Uh oh!

pkuzyc commented Jul 24, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented Jul 24, 2024

Uh oh!

JZ-LIANG Jul 25, 2024

Uh oh!

pkuzyc Jul 25, 2024

Uh oh!

JZ-LIANG Jul 29, 2024

Uh oh!

pkuzyc Jul 30, 2024

Uh oh!

zhiqiu left a comment

Uh oh!

xiaoguoguo626807 left a comment

Uh oh!

Uh oh!

Uh oh!

[Dist Dialect] Add MoE-related api in PIR dist dialect #66462

[Dist Dialect] Add MoE-related api in PIR dist dialect #66462

Uh oh!

Conversation

pkuzyc commented Jul 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Jul 24, 2024

Uh oh!

JZ-LIANG Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

pkuzyc Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

JZ-LIANG Jul 29, 2024

Choose a reason for hiding this comment

Uh oh!

pkuzyc Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

zhiqiu left a comment

Choose a reason for hiding this comment

Uh oh!

xiaoguoguo626807 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pkuzyc commented Jul 24, 2024 •

edited

Loading