Open
Description
Feature request
Currently using a map
call processes and duplicates the whole dataset, which takes both time and disk space.
The solution is to allow lazy mapping, which is essentially a saved chain of transforms that are applied on the fly whenever a slice of the dataset is requested.
The API should look like map
, as set_transform
changes the current dataset while map
returns another dataset.
Motivation
Lazy processing allows lower disk usage and faster experimentation.
Your contribution
_