Skip to content

Move datasets to MLDatasets.jl #22

Closed
@darsnack

Description

@darsnack

In the long term, we'd like most of the src/datasets code to move to MLDatasets.jl. To make this happen, we need a refactor of MLDatasets.jl to be more extensible and build on top of LearnBase.jl. Below is the structure envisioned for MLDatasets.jl:

  1. Low-level API: structs for different types of I/O (e.g. FileDataset) that support reading from the underlying I/O via getobs and nobs from LearnBase.jl
  2. High-level API: specific datasets (e.g. CIFAR10) implement using the low-level API

To achieve this goal, we need to complete the following stages:

  • Move data containers (e.g. FileDataset) to MLDatasets.jl
  • Move data container transformations (e.g. mapobs, groupsobs, etc.) to MLDataPattern.jl (these transformations apply generically to any iterator of observations, not just data containers)
  • Refactor existing data sets in MLDatasets.jl to utilize the low-level APIs
  • Move FastAI.jl datasets to MLDatasets.jl

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueGood for newcomersgsoc-proposalGood issues to tackle for GSoC proposalshelp wantedContributions welcome!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions