-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Grain's documentation notes that DataLoader will try using a data source as a context manager:
Open file handles should be closed after use. Data sources typically open underlying files in order to read records from them. We recommend implementing data sources as context managers that close their open file handles within the exit method. When opening a data source, the DataLoader will first attempt to use the data source as a context manager. If the data source doesn’t implement the context manager protocol, it will be used as-is, without a with statement.
However, from my testing, it seems that DataLoader doesn't do this. See the following code snippet:
import grain.python as pygrain
from grain.sources import RandomAccessDataSource
class ContextDataset(RandomAccessDataSource):
data: list
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx]
def __enter__(self):
print('entering')
self.data = [1,2,3,4,5,6]
return self
def __exit__(self, exc_type, exc_val, exc_tb):
print('exiting')
return False
if __name__ == '__main__':
index_sampler = pygrain.IndexSampler(
num_records=6,
num_epochs=1,
shard_options=pygrain.NoSharding(),
shuffle=False,
seed=0)
transformations = [pygrain.Batch(batch_size=1, drop_remainder=True)]
data_source = ContextDataset()
dataloader = pygrain.DataLoader(
data_source=data_source,
operations=transformations,
sampler=index_sampler,
worker_count=0)
for data in dataloader: # errors here due to data not being initialized
print(data)
In this example, should the datasource be accessed as a context manager, data will be populated. However, running the example raises AttributeError: 'ContextDataset' object has no attribute 'data' when calling __getitem__ by iterating over the dataloader, and entering is never printed.
I tested with the latest stable version of grain (0.2.11) on Python 3.12.7.