DeepSpeed MoE #1310

awan-10 · 2021-08-17T04:54:04Z

This PR introduces DeepSpeed Mixture of Experts (MoE) support. The code has been written in collaboration with many contributors at Microsoft including the Z-code team.

Co-authored-by: Alex Muzio <Alex.Muzio@microsoft.com> Co-authored-by: Alex Muzio <alferre@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Felipe Cruz Salinas <Andres.Cruz@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <shaden.smith@microsoft.com> Co-authored-by: Young Jin Kim <youki@microsoft.com> Co-authored-by: alexandremuzio <ax.muzio@gmail.com> Co-authored-by: bapatra <bapatra@microsoft.com>

tjruwase · 2024-05-06T14:09:56Z

csrc/adam/cpu_adam.cpp

@@ -23,7 +23,8 @@ void Adam_Optimizer::Step(float* _params,
                          float* _exp_avg,
                          float* _exp_avg_sq,
                          size_t _param_size,
-                          __half* dev_params)
+                          __half* dev_params,
+                          bool half_precision)


@awan-10, @jeffra, @RezaYazdaniAminabadi, sorry i realize this is almost 3 years old, but I need to understand the introduction of half_precision in this PR. Who can I talk to you? Thanks!

For context, this affects a current PR under review
#5409

Hi @tjruwase

I can chat with you on this.
I think this is mostly added here to make sure the right AVX operation is selected for FP32 vs FP16. However, as i see it is now templated in this new PR.
Thanks,
Reza

@RezaYazdaniAminabadi, thanks for the response. Your explanation was my guess as well. I think the entire code can be greatly simplified with the template usage and also improved type support in torch. Can you please help to review the new PR and also engage in the conversation there?

jeffra and others added 20 commits August 16, 2021 19:58

Add assertion for batch size and min_capacity. (#238)

b7e0211

tutorial changes

b1ec41d

assert for fp16_master_params_and_grads (#240)

fb28255

Add some groups stuff.

6cde38c

minor format.

c0cc16c

add table1

338ab54

fix table a bit more

f2f2c3c

finalize table 1.

12e275c

check some api

40b9163

cleanup

ab5e5fd

add some example.

b1bf4ac

simply.

f287dac

some more changes.

2601e24

check format

427dded

add zero stuff

03f4b8d

fix

410eb9b

add rts

fe149cc

remove paper link

1d78b51

cleanup

3f2d9b2

awan-10 requested review from cli99, conglongli, eltonzheng, jeffra, minjiaz, niumanar, RezaYazdaniAminabadi, samyam, ShadenSmith and tjruwase as code owners August 17, 2021 04:54

jeffra added 2 commits August 16, 2021 21:54

bump to 0.5.0

53daa42

formatting

acfd030

awan-10 enabled auto-merge (squash) August 17, 2021 05:00

jeffra approved these changes Aug 17, 2021

View reviewed changes

awan-10 merged commit f284324 into master Aug 17, 2021

awan-10 deleted the staging-moe-zero-v3 branch September 15, 2021 19:10

tjruwase reviewed May 6, 2024

View reviewed changes

tjruwase mentioned this pull request May 8, 2024

CPUAdam fp16 and bf16 support #5409

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed MoE #1310

DeepSpeed MoE #1310

awan-10 commented Aug 17, 2021 •

edited

Loading

tjruwase May 6, 2024 •

edited

Loading

RezaYazdaniAminabadi May 6, 2024

tjruwase May 6, 2024

DeepSpeed MoE #1310

DeepSpeed MoE #1310

Conversation

awan-10 commented Aug 17, 2021 • edited Loading

tjruwase May 6, 2024 • edited Loading

Choose a reason for hiding this comment

RezaYazdaniAminabadi May 6, 2024

Choose a reason for hiding this comment

tjruwase May 6, 2024

Choose a reason for hiding this comment

awan-10 commented Aug 17, 2021 •

edited

Loading

tjruwase May 6, 2024 •

edited

Loading