Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating pytorch XLA when using multiple GPUs #16130

Open
Mohamed-Dhouib opened this issue Dec 20, 2022 · 11 comments
Open

Integrating pytorch XLA when using multiple GPUs #16130

Mohamed-Dhouib opened this issue Dec 20, 2022 · 11 comments
Labels
accelerator: cuda Compute Unified Device Architecture GPU feature Is an improvement or enhancement help wanted Open to be worked on strategy: xla
Milestone

Comments

@Mohamed-Dhouib
Copy link

Mohamed-Dhouib commented Dec 20, 2022

Description & Motivation

I've experienced with pytorch XLA using multitple NVIDIA A100 GPU and I observed that in most cases training is faster. So it would be really nice to have the option to use XLA for training in pytorch lightning.

The main advantage is faster training.

Additional context

Here is a code link : https://github.com/Dhouib-med/Test-XLA/blob/17e5b6bd6c77fffa67818462856277a57877ff3b/test_xla.py to train a simple CNN on the MNIST dataset using XLA (on 2 GPUS). The main parts where taken from https://github.com/pytorch/xla.
This wheel needs to be installed along with adequate pytorch and torchvision versions (1.11 and 0.14) https://storage.googleapis.com/tpu-pytorch/wheels/cuda/112/torch_xla-1.13-cp37-cp37m-linux_x86_64.whl
@justusschock

cc @Borda @justusschock @awaelchli @carmocca @JackCaoG @steventk-g @Liyang90

@Mohamed-Dhouib Mohamed-Dhouib added the needs triage Waiting to be triaged by maintainers label Dec 20, 2022
@justusschock justusschock added feature Is an improvement or enhancement accelerator: cuda Compute Unified Device Architecture GPU and removed needs triage Waiting to be triaged by maintainers labels Dec 20, 2022
@Mohamed-Dhouib Mohamed-Dhouib changed the title Integrating pytorch XLA in models training using GPU Integrating pytorch XLA when using multiple GPUs Dec 20, 2022
@carmocca carmocca added this to the future milestone Dec 22, 2022
@carmocca
Copy link
Contributor

Let's do it! To be clear, this would be enabled with: Trainer(accelerator='cuda'|'gpu', strategy='xla')

@justusschock
Copy link
Member

@carmocca I assume we could reuse a lot of our current xla-strategy for tpus.

@carmocca
Copy link
Contributor

carmocca commented Dec 22, 2022

That would be part of the goal

@awaelchli
Copy link
Contributor

I like it, and I think it won't even be that hard! The abstraction of strategy and accelerator are already in place and are meant to support exactly this kind of relationship between a communication layer (xla) and accelerator (gpu/tpu).
The first step towards this will be to simply rename our TPUSpawnStrategy to XLAStrategy (which is what we already planned to do and have done so already in lightning_lite).

@Borda
Copy link
Member

Borda commented Jan 31, 2023

This is great! 🐰

@qipengh
Copy link

qipengh commented Feb 15, 2023

Hello,this is very wonderful work! I want to know when we can finish it that Trainer(accelerator='cuda'|'gpu', strategy='xla') can work normally.

@awaelchli
Copy link
Contributor

@qipengh We haven't started working on it. The feature is up for grabs if you or anyone from the community has interest in contributing and testing it out.

@carmocca
Copy link
Contributor

This should become very easy once we add support for XLA's PJRT runtime: https://github.com/pytorch/xla/blob/master/docs/pjrt.md#gpu

@JackCaoG
Copy link

FYI @Liyang90 has a pr to add PJRT support in #17352

@carmocca
Copy link
Contributor

In addition, we need to land

  • CI setup on CPU and/or CUDA workflows
  • Connector support for the combination
  • Docs

@stellarpower
Copy link

Is there an example model on how to use XLA with (a single) CUDA GPU? The link above now 404s since it was posted, I am struggling to find one anywhere; currently everything I come across is for TPUs only.

Roughly how much work do folks think is still needed in order to implement this FR?

@carmocca carmocca removed their assignment Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: cuda Compute Unified Device Architecture GPU feature Is an improvement or enhancement help wanted Open to be worked on strategy: xla
Projects
None yet
Development

No branches or pull requests

8 participants