[feat] Support TPUs for intra-layer model parallel training #83

myleott · 2020-09-14T11:57:00Z

🚀 Feature

The current implementation is based on Megatron and only supports GPUs. Now that we’re migrating fairseq to this implementation, we should add TPU support here as well.

Motivation

Several fairseq users and internal projects would benefit from TPU support. For example, see facebookresearch/fairseq#2503.

Pitch

Replace all the CUDA-specific calls with device-agnostic versions that are compatible with PyTorch/XLA.

Alternatives

Not support TPUs.

Additional context

I have a preliminary version of the needed changes, but they are based on an old version of Megatron, so would need to be rebased over the (newer) Megatron fork in fairscale.

* two small changes - link to deepspeed in the doc - added a quick test in the init to catch a common user error * addressed comments * remove an overly strong assert * addressed comments

kaushikb11 · 2021-07-05T13:38:03Z

Hi @myleott, any updates on this?

myleott changed the title ~~Support TPUs (PyTorch/XLA) for model parallel training~~ Support TPUs (PyTorch/XLA) for intra-layer model parallel training Sep 14, 2020

myleott changed the title ~~Support TPUs (PyTorch/XLA) for intra-layer model parallel training~~ Support TPUs for intra-layer model parallel training Sep 14, 2020

msbaines assigned froody Sep 15, 2020

msbaines added the enhancement New feature or request label Sep 23, 2020

min-xu-ai changed the title ~~Support TPUs for intra-layer model parallel training~~ [feat] Support TPUs for intra-layer model parallel training Dec 16, 2020

myleott pushed a commit that referenced this issue Feb 22, 2021

two small changes (#83)

7bd82d1

* two small changes - link to deepspeed in the doc - added a quick test in the init to catch a common user error * addressed comments * remove an overly strong assert * addressed comments

min-xu-ai unassigned froody Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Support TPUs for intra-layer model parallel training #83

[feat] Support TPUs for intra-layer model parallel training #83

myleott commented Sep 14, 2020

kaushikb11 commented Jul 5, 2021

[feat] Support TPUs for intra-layer model parallel training #83

[feat] Support TPUs for intra-layer model parallel training #83

Comments

myleott commented Sep 14, 2020

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

kaushikb11 commented Jul 5, 2021