-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly support Iterables as an alternative to DataLoaders in Trainer.{fit, validate, ...}
#10696
Comments
+1 for this. Regarding the limitations: Same goes for the fault tolerance IMO. Regarding the ambiguity of the types for the loader: |
Hey @awaelchli, Did you find a demand for such a feature from the community? |
@tchaton we once had support for it and also I mentioned that I know a few people not using PyTorch dataloaders (sometimes including myself). Not sure when the regression happened since we don't have tests for it, but I think this is a pretty strong restriction to make. |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
@Stale you blind??? This had a MILESTONE!!! |
Hey maintainers :) I would be really looking forward to this feature 🚀 |
🚀 Feature
Iterable
toTrainer.{fit, validate, test, predict}
Iterable
inLightningModule.*_dataloder()
andLightningDataModule.*_dataloder()
Motivation
Lightning today supports iterating over a DataLoader (when creating an iterator from it) or multiple dataloaders. Some claims have been made in the past that the Trainer also supports
Iterable
, but this is not true today because a trivial example can demonstrate that it fails. In any case, ourTrainer
methods as well as theLightning{Data}Module
methods do not reflect this in their signature types.Some applications require operating directly on an instance of Iterable and wrapping them inside a torch DataLoader/Dataset would not be desirable or feasible. An example from the medical domain: https://github.com/MIC-DKFZ/batchgenerators
Furthermore, the introduction of Iterables would most likely facilitate the integration of torch/data DataPipes which are soon to be included in PyTorch.
Pitch
Support the Iterable type.
Usage example:
Several places inside data_loading.py would have to be updated with branching logic that excludes setup steps normally done for DataLoaders (see limitations section below).
Limitations and Challenges
Some features in Lightning will not work for Iterables. Most notably:
Trainer.fit()
: The .fit() method and it's sisters currently support these types:https://github.com/PyTorchLightning/pytorch-lightning/blob/48cf1adfd3ad9c7e659083a4afc334dafb331f28/pytorch_lightning/utilities/types.py#L36-L43
However, if we now add
Iterable
, we run into an ambiguous case becauseDict
andSequence
are subtypes ofIterable
.Alternatives
Additional context
#10279 attempted to add support for Iterable types in Lite, but it was reverted later on due to incompleteness.
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @Borda @justusschock @awaelchli @ninginthecloud @tchaton @rohitgr7
The text was updated successfully, but these errors were encountered: