Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Support TPUs for intra-layer model parallel training #83

Open
myleott opened this issue Sep 14, 2020 · 1 comment
Open

[feat] Support TPUs for intra-layer model parallel training #83

myleott opened this issue Sep 14, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@myleott
Copy link
Contributor

myleott commented Sep 14, 2020

🚀 Feature

The current implementation is based on Megatron and only supports GPUs. Now that we’re migrating fairseq to this implementation, we should add TPU support here as well.

Motivation

Several fairseq users and internal projects would benefit from TPU support. For example, see facebookresearch/fairseq#2503.

Pitch

Replace all the CUDA-specific calls with device-agnostic versions that are compatible with PyTorch/XLA.

Alternatives

Not support TPUs.

Additional context

I have a preliminary version of the needed changes, but they are based on an old version of Megatron, so would need to be rebased over the (newer) Megatron fork in fairscale.

@myleott myleott changed the title Support TPUs (PyTorch/XLA) for model parallel training Support TPUs (PyTorch/XLA) for intra-layer model parallel training Sep 14, 2020
@myleott myleott changed the title Support TPUs (PyTorch/XLA) for intra-layer model parallel training Support TPUs for intra-layer model parallel training Sep 14, 2020
@msbaines msbaines added the enhancement New feature or request label Sep 23, 2020
@min-xu-ai min-xu-ai changed the title Support TPUs for intra-layer model parallel training [feat] Support TPUs for intra-layer model parallel training Dec 16, 2020
myleott pushed a commit that referenced this issue Feb 22, 2021
* two small changes

- link to deepspeed in the doc
- added a quick test in the init to catch a common user error

* addressed comments

* remove an overly strong assert

* addressed comments
@kaushikb11
Copy link

Hi @myleott, any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants