title | sneak_preview | tags | excerpt |
---|---|---|---|
ZeRO stage 1 with reduced communication |
true |
training ZeRO English |
Partition-aware ZeRO with up to 2x reduction in communication time! |
- Partition-aware approach instead of initial implementation that used a global collective (all-reduce)
- Total communication volume reduction 1.5x -> 1x of data parallelism
- Up to 2x reduction in communication time compared to all-reduce