Closed
Description
Bug description
Trainer.predict(model, datamodule)
on sufficiently large data would cause CPU out-of-memory due to the fact that results are appended to a list during predict (this is true even if setting return_predictions=False
): https://github.com/Lightning-AI/lightning/blob/4e8cf85b0cd5128adcec3f3ad0f2254f417ae1ee/src/pytorch_lightning/loops/dataloader/prediction_loop.py#L103
What is the correct way of running prediction on a dataset that is orders of magnitude larger than CPU memory?
How to reproduce the bug
# Just always return `None` in `predict_step` and track ur memory usage:
def predict_step(self, batch, batch_idx):
import objgraph
objgraph.show_growth(limit=3)
return None
Error messages and logs
# You will see memory for type list will increment at every prediction step like below
list 11320 +1
Environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 1.10):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response