Skip to content

Add device information to the accelerator config message #17355

Open
@carmocca

Description

@carmocca

Description & Motivation

Revamp

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

To

GPU available: M1, using 1 devices
TPU available: v4-8, using 0 devices
IPU available: False, using 0 devices
HPU available: False, using 0 devices

The relevant code is: https://github.com/Lightning-AI/lightning/blob/f14ee9edbc8269054e12daf30b8681d530e73369/src/lightning/pytorch/trainer/setup.py#L145-L171

Pitch

If the accelerator is available, True changes to the actual name of the accelerator used.
If it's unavailable, we still show False.

For GPUs, the cuda|mps field is gone, as it should be clear from the device.

I also propose that the GPU field shows the number of devices, instead of a used boolean.

We can get this info via

# CUDA
torch.cuda.get_device_name()

# TPU
from torch_xla.experimental import tpu
import torch_xla.core.xla_env_vars as xenv
# note: this needs a try-except as this will send a request
tpu.get_tpu_env()[xenv.ACCELERATOR_TYPE]

For MPS, HPU, IPU we would need to find out if we can get this information. In the meantime, we can still fallback to "True" for them.

This could be done by introducing an Accelerator.device_name(device) staticmethod

Alternatives

One caveat is that this might be misleading with heterogeneous devices, as only rank zero prints this information.

Additional context

No response

cc @Borda @justusschock @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions