Closed
Description
In the long term, we'd like most of the src/datasets
code to move to MLDatasets.jl. To make this happen, we need a refactor of MLDatasets.jl to be more extensible and build on top of LearnBase.jl. Below is the structure envisioned for MLDatasets.jl:
- Low-level API: structs for different types of I/O (e.g.
FileDataset
) that support reading from the underlying I/O viagetobs
andnobs
from LearnBase.jl - High-level API: specific datasets (e.g. CIFAR10) implement using the low-level API
To achieve this goal, we need to complete the following stages:
- Move data containers (e.g.
FileDataset
) to MLDatasets.jl - Move data container transformations (e.g.
mapobs
,groupsobs
, etc.) to MLDataPattern.jl (these transformations apply generically to any iterator of observations, not just data containers) - Refactor existing data sets in MLDatasets.jl to utilize the low-level APIs
- Move FastAI.jl datasets to MLDatasets.jl