Error in the Getting Started Example -- default_collate cannot handle PIL.Images

## 🐛 Bug

Hello! Excited about making use of litdata.optimize to solve some long standing data loading issues Ive encountered many times before.


While trying to get it working with my dataset, I tried running through the getting started example that creates binary files of a fake dataset consisting of random noise images, but pretty quickly ran into an error that should've been caught.

Running `litdata.optimize()` runs without an issue, and the `streaming.py` also technically runs without issue. But, if I try to do anything with the DataLoader defined at the end of `streaming.py`, I get a `default_collate` error because it is incapable of batching PIL.Images.

Given that pretty much all of the examples I could find in the documentation rely heavily on PIL.Image, I'm wondering

A. Can we update the getting started example so that it at least defines a DataLoader that wont immediately throw an error?
B. Can we get some more explanation/exploration into whether it's best to save images as PIL.Image within the optimized files, or if it's better to convert them to torch.Tensor/np.Arrays in advance so they're not repeatedly converted every epoch (My intuition is leaning towards the latter)

### To Reproduce

If desired, I can provide more details on the environment I used to produce this error, but given that it's already based on the minimal working example I anticipate it shouldnt require much imagination to reproduce.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in the Getting Started Example -- default_collate cannot handle PIL.Images #573

🐛 Bug

To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error in the Getting Started Example -- default_collate cannot handle PIL.Images #573

Description

🐛 Bug

To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions