Skip to content

RichProgressBar in v1.6 is slower than v1.5 #13937

@akihironitta

Description

@akihironitta

🐛 Bug

Similarly to (but independently of) #13179, there's a regression in speed of the rich progress bar between 1.5 and 1.6.

time: 2.059369374997914   # v1.5.10
time: 13.186531708983239  # v1.6.0
time: 11.988152374979109  # v1.6.5
time: 12.897805292042904  # master (aefb9ab43f9a8e6704558a346dbae1a00044bb45)

To Reproduce

from time import monotonic
import torch
from torch.utils.data import DataLoader, Dataset
from pytorch_lightning import LightningModule, Trainer
from pytorch_lightning.callbacks import RichProgressBar

class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.1)
        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
        return [optimizer], [lr_scheduler]

def main():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    model = BoringModel()
    trainer = Trainer(
        max_epochs=100,
        enable_model_summary=False,
        enable_checkpointing=False,
        logger=False,
        benchmark=False,  # True by default in 1.6.{0-3}.
        callbacks=RichProgressBar(),
    )
    t0 = monotonic()
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
    print("time:", monotonic() - t0)

if __name__ == "__main__":
    main()

Expected behavior

There should be no regression unless there's a reasonable explanation.

Environment

  • Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): n/a
  • PyTorch Lightning Version (e.g., 1.5.0): master (aefb9ab), 1.6.5, 1.6.0, 1.5.10
  • PyTorch Version (e.g., 1.10): 1.12
  • Python version (e.g., 3.9): 3.10
  • OS (e.g., Linux): macOS
  • CUDA/cuDNN version: n/a
  • GPU models and configuration: Apple Silicon M1
  • How you installed PyTorch (conda, pip, source): pip
  • If compiling from source, the output of torch.__config__.show():
  • Running environment of LightningApp (e.g. local, cloud): n/a
  • Any other relevant information: rich==12.5.1

Additional context

Investigating!

https://github.com/Lightning-AI/lightning/pulls?q=is%3Apr+label%3A%22progress+bar%3A+rich%22+milestone%3A%22pl%3A1.6%22+

cc @tchaton @rohitgr7 @kaushikb11 @Borda @akihironitta

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions