You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug: Inconsistent Behavior in StreamingDataLoader After Loading States (Specific to CombinedStreamingDataset)
Description:
The StreamingDataLoader exhibits inconsistent behavior when handling loaded states across different scenarios. Specifically, issues arise when iterating over the dataloader after loading states with a complete or partial first epoch.
This bug is an extension of #316 for CombinedStreamingDataset.
Traceback (most recent call last):
File "/Users/bhimrajyadav/litdata/test_combined_dataset.py", line 10, in<module>dataloader.load_state_dict(dataloader.state_dict())
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bhimrajyadav/litdata/venv/lib/python3.12/site-packages/litdata/streaming/dataloader.py", line 668, in state_dict
num_samples_yieled = [0 for_inrange(len(list(self._num_samples_yielded_combined.values())[0]))]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
File "/Users/bhimrajyadav/itdata/venv/lib/python3.12/site-packages/litdata/streaming/combined.py", line 160, in __iter__
self._iterator = _CombinedDatasetIterator(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bhimrajyadav/litdata/venv/lib/python3.12/site-packages/litdata/streaming/combined.py", line 208, in __init__
self._dataset_iters = [iter(dataset) fordatasetin datasets]
^^^^^^^^^^^^^
File "/Users/bhimrajyadav/litdata/venv/lib/python3.12/site-packages/litdata/streaming/dataset.py", line 223, in __iter__
self._validate_state_dict()
File "/Users/bhimrajyadav/litdata/venv/lib/python3.12/site-packages/litdata/streaming/dataset.py", line 479, in _validate_state_dict
raise ValueError(
ValueError: The provided `num_samples_yielded` state is greater than the dataset length. Found `51` instead of `50`.
After loading the dataloader state with a partially completed first epoch, the dataloader does not reset correctly upon completing the epoch.
Additional details will be added.
Environment
PyTorch Version (e.g., 1.0): 2.4.0
OS (e.g., Linux): Mac OS
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source):
Python version: 3.12.4
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:
Additional context
The text was updated successfully, but these errors were encountered:
bhimrazy
changed the title
Bug: Inconsistent Behavior with StreamingDataloader loading states (specific for CombinedStreamingDataset)
Bug: Inconsistent Behavior with StreamingDataloader loading states (specific with CombinedStreamingDataset)
Aug 14, 2024
bhimrazy
changed the title
Bug: Inconsistent Behavior with StreamingDataloader loading states (specific with CombinedStreamingDataset)
Bug: Inconsistent Behavior with StreamingDataloader loading states (specific to CombinedStreamingDataset)
Aug 14, 2024
🐛 Bug
Bug: Inconsistent Behavior in
StreamingDataLoader
After Loading States (Specific toCombinedStreamingDataset
)Description:
The
StreamingDataLoader
exhibits inconsistent behavior when handling loaded states across different scenarios. Specifically, issues arise when iterating over the dataloader after loading states with a complete or partial first epoch.This bug is an extension of #316 for
CombinedStreamingDataset
.To Reproduce
Create Optimized Dataset
Bugs
IndexError raised when loading dataloader state without prior iteration
Output
After loading the dataloader state following the completion of the first epoch, a
ValueError
is thrown (previously anIndexError
, see clearer example in issue Failed to Resume Training w/ CombinedStreamingDataset #363).Output
After loading the dataloader state with a partially completed first epoch, the dataloader does not reset correctly upon completing the epoch.
Environment
conda
,pip
, source): pipAdditional context
The text was updated successfully, but these errors were encountered: