[example] gpt, shard init on all processes #2366

feifeibear · 2023-01-06T06:55:52Z

What's new

provide a memory efficient way to init model distributedly.
During ColoInitContext, it initializes 1/N parameter on each process.
Then, during tensor_parallelize, it changes the process group, distspec for tensor parallel.
So during the initialization phase, the overall memory requirement is just one copy of the model.

…ev0106

feifeibear added 6 commits January 6, 2023 11:30

[example] add google doc for benchmark results of GPT

8d2e33d

add tencet doc

b3199df

[example] gpt, shard init on all processes

41ad66d

Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into d…

594e7d1

…ev0106

polish comments

9d945af

polish code

29c25d9

feifeibear merged commit 1aaeb59 into hpcaitech:main Jan 6, 2023

feifeibear deleted the dev0106_1 branch January 6, 2023 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[example] gpt, shard init on all processes #2366

[example] gpt, shard init on all processes #2366

Uh oh!

feifeibear commented Jan 6, 2023 •

edited

Loading

Uh oh!

Uh oh!

[example] gpt, shard init on all processes #2366

[example] gpt, shard init on all processes #2366

Uh oh!

Conversation

feifeibear commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's new

Uh oh!

Uh oh!

feifeibear commented Jan 6, 2023 •

edited

Loading