Properly support Iterables as an alternative to DataLoaders in `Trainer.{fit, validate, ...}` #10696

awaelchli · 2021-11-23T10:38:43Z

🚀 Feature

Support passing dataloaders of type Iterable to Trainer.{fit, validate, test, predict}
Support returning dataloaders of type Iterable in LightningModule.*_dataloder() and LightningDataModule.*_dataloder()

Motivation

Lightning today supports iterating over a DataLoader (when creating an iterator from it) or multiple dataloaders. Some claims have been made in the past that the Trainer also supports Iterable, but this is not true today because a trivial example can demonstrate that it fails. In any case, our Trainer methods as well as the Lightning{Data}Module methods do not reflect this in their signature types.
Some applications require operating directly on an instance of Iterable and wrapping them inside a torch DataLoader/Dataset would not be desirable or feasible. An example from the medical domain: https://github.com/MIC-DKFZ/batchgenerators

Furthermore, the introduction of Iterables would most likely facilitate the integration of torch/data DataPipes which are soon to be included in PyTorch.

Pitch

Support the Iterable type.

Usage example:

trainer = Trainer()
model = Model()
data = [batch0, batch1, batch2]  # a list of batches
trainer.fit(model, train_dataloaders=data)

Several places inside data_loading.py would have to be updated with branching logic that excludes setup steps normally done for DataLoaders (see limitations section below).

Limitations and Challenges

Some features in Lightning will not work for Iterables. Most notably:

Automatic replacement of samplers for distributed data-parallel: Samplers do not apply here for Iterable datastructures. However, the user will still have to take care of correctly returning the data in case of distributed training.
Fault-tolerant training: Not supported out of the box. The user may have to switch to manual-mode fault-tolerant training Add support for manual fault tolerant #10568
Ambiguity of the input type for Trainer.fit(): The .fit() method and it's sisters currently support these types:
https://github.com/PyTorchLightning/pytorch-lightning/blob/48cf1adfd3ad9c7e659083a4afc334dafb331f28/pytorch_lightning/utilities/types.py#L36-L43
However, if we now add Iterable, we run into an ambiguous case because Dict and Sequence are subtypes of Iterable.

Alternatives

Do not support Iterable. The user will have to wrap their Iterables into IterableDataset and compose them into a DataLoader.
Add support only in Lite. Drawback: Users converting to Lightning will have questions.

Additional context

#10279 attempted to add support for Iterable types in Lite, but it was reverted later on due to incompleteness.

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @justusschock @awaelchli @ninginthecloud @tchaton @rohitgr7

The text was updated successfully, but these errors were encountered:

justusschock · 2021-11-23T15:21:12Z

+1 for this.

Regarding the limitations:
Back when there was still support for that, stuff like patching the sampler has always been done only for DataLoaders or subclasses. Using other iterables required to manually take care of such things and I think that this is the only way we can actually support this given the arbitrary implementations of such iterables.

Same goes for the fault tolerance IMO.

Regarding the ambiguity of the types for the loader:
One Option would be to only allow support for a single loader (which could then be a combined loader).
While this would be more explicit (and also allow us to remove some arguments from the trainers init), it would be a step back in terms of UX. For the dicts you could explicitly check for mappings and for the sequences maybe investigate the type of the first element?

tchaton · 2021-11-24T11:33:39Z

Hey @awaelchli,

Did you find a demand for such a feature from the community?

justusschock · 2021-11-24T12:28:57Z

@tchaton we once had support for it and also I mentioned that I know a few people not using PyTorch dataloaders (sometimes including myself). Not sure when the regression happened since we don't have tests for it, but I think this is a pretty strong restriction to make.

stale · 2021-12-26T14:14:13Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

awaelchli · 2022-01-16T03:08:33Z

@Stale you blind??? This had a MILESTONE!!!

alessiamarcolini · 2022-03-28T12:43:00Z

Hey maintainers :) I would be really looking forward to this feature 🚀

awaelchli added feature Is an improvement or enhancement data handling Generic data-related topic trainer: fit design Includes a design discussion labels Nov 23, 2021

justusschock mentioned this issue Dec 17, 2021

Fix AttributeError when using CombinedLoader in prediction #11111

Merged

11 tasks

stale bot added the won't fix This will not be worked on label Dec 26, 2021

awaelchli modified the milestones: 1.6, 1.7 Dec 26, 2021

stale bot closed this as completed Jan 9, 2022

awaelchli removed the won't fix This will not be worked on label Jan 16, 2022

awaelchli reopened this Jan 16, 2022

justusschock mentioned this issue Jan 20, 2022

enable ffcv with PL #11538

Closed

tshu-w mentioned this issue Apr 7, 2022

Sample sequentially from multiple DataLoaders in LightningDataModule #12650

Closed

carmocca modified the milestones: pl:1.7, future Jul 19, 2022

carmocca mentioned this issue Jan 31, 2023

Sized iterable typing improvements #16585

Merged

carmocca modified the milestones: future, 2.0 Jan 31, 2023

carmocca self-assigned this Jan 31, 2023

This was referenced Feb 15, 2023

Sequential CombinedLoader to flatten the eval and predict loops #16726

Merged

Run XLA's dataloader validation per dataloader #16775

Merged

carmocca closed this as completed in #16726 Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly support Iterables as an alternative to DataLoaders in `Trainer.{fit, validate, ...}` #10696

Properly support Iterables as an alternative to DataLoaders in `Trainer.{fit, validate, ...}` #10696

awaelchli commented Nov 23, 2021 •

edited by github-actions bot

Loading

justusschock commented Nov 23, 2021

tchaton commented Nov 24, 2021

justusschock commented Nov 24, 2021

stale bot commented Dec 26, 2021

awaelchli commented Jan 16, 2022

alessiamarcolini commented Mar 28, 2022

Properly support Iterables as an alternative to DataLoaders in Trainer.{fit, validate, ...} #10696

Properly support Iterables as an alternative to DataLoaders in Trainer.{fit, validate, ...} #10696

Comments

awaelchli commented Nov 23, 2021 • edited by github-actions bot Loading

🚀 Feature

Motivation

Pitch

Limitations and Challenges

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

justusschock commented Nov 23, 2021

tchaton commented Nov 24, 2021

justusschock commented Nov 24, 2021

stale bot commented Dec 26, 2021

awaelchli commented Jan 16, 2022

alessiamarcolini commented Mar 28, 2022

Properly support Iterables as an alternative to DataLoaders in `Trainer.{fit, validate, ...}` #10696

Properly support Iterables as an alternative to DataLoaders in `Trainer.{fit, validate, ...}` #10696

awaelchli commented Nov 23, 2021 •

edited by github-actions bot

Loading