Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FastAI datasets #56

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Add FastAI datasets #56

wants to merge 1 commit into from

Conversation

lorenzoh
Copy link
Contributor

@lorenzoh lorenzoh commented Feb 25, 2021

This adds datadeps of all datasets from the FastAI dataset collection. As proposed in DLDatasets.jl#1.

The basic functionality:

using MLDatasets.FastAIDatasets
using MLDatasets.FastAIDatasets:
    datasetpath,  # download dataset and get directory
    DATASETS,  # list of all datasets
    loaddataclassification,  # load an image classification dataset into a data container with observations `(image, class)` 
    loaddatasegmentation,  # load an image segmentation dataset into a data container with observations `(image, mask)`  

@lorenzoh lorenzoh marked this pull request as draft February 25, 2021 19:58
@CarloLucibello
Copy link
Member

What's the reason for having loaddatasegmentation / loaddataclassification and not just loaddata?

@lorenzoh
Copy link
Contributor Author

They're supposed to work on any folder containing the dataset in the right format, not just the included datasets. Also some datasets can be used for multiple different tasks, so there is no 1-to-1 mapping.

@MariusDrulea
Copy link

Is this PR still desired/feasible?

@lorenzoh
Copy link
Contributor Author

I will leave the decision on whether this is desirable to more active maintainers of this repository, but will note that implementation-wise a lot of things have changed since this PR was first opened. The largest change is that LearnBase.jl+MLDataPattern.jl have been superseded by MLUtils.jl.
From the FastAI.jl side, I am always happy to take stuff out and move it into a more canonical package, as would be the case with these datasets.

@CarloLucibello
Copy link
Member

I would be in favor of moving these datasets here, provided we manage to make the interface consistent with the other datasets here. I don't know how hard that would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants