Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] New datasets to torchvision #3562

Open
5 of 17 tasks
oke-aditya opened this issue Mar 12, 2021 · 12 comments
Open
5 of 17 tasks

[RFC] New datasets to torchvision #3562

oke-aditya opened this issue Mar 12, 2021 · 12 comments

Comments

@oke-aditya
Copy link
Contributor

oke-aditya commented Mar 12, 2021

🚀 Feature

This is a proposal to add more highly cited datasets. Thanks to papers with code datasets which made this search easy.

Motivation

These datasets are used quite frequently and would provide benefits to both researchers as well as people who work in computer vision. I'm not sure of the citation metric, but we can verify the count of papers once.

Pitch

The following datasets can be considered. Papers are reported as per the last 5 years count on papers with code. They can be inaccurate, feel free to edit. I'm also adding previously approved or proposed ones

See #5108

Probably, we should think and add these, one by one. Also support downloading, not just loading of the dataset.

Additional context

Please feel free to discuss about datasets before opening PRs!

cc @pmeier

@fmassa
Copy link
Member

fmassa commented Mar 15, 2021

Hi,

This is exactly our current idea, thanks for bringing it up.

I agree with all the aforementioned proposals. One thing to mention as well is that there is an ongoing effort to provide new dataset abstractions in PyTorch via DataPipes pytorch/pytorch#49440.

While this doesn't block us providing new datasets, it is good to keep in mind that we might in the future revisit the way we implement datasets.

@seyeeet
Copy link

seyeeet commented Mar 24, 2021

related to this issue, it can also be useful if pytorch can store this datasets on their storage and provide link to download them.
e.g. there are a lot of issues with downloading imagenet and other large datasets, im not sure if licensing can be problematic, but it would be super useful

@pmeier
Copy link
Collaborator

pmeier commented Mar 25, 2021

@seyeeet

im not sure if licensing can be problematic

Yes, it is and thus

pytorch can store this datasets on their storage and provide link to download them

will never happen.

Also see this section in our README

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

@harishsdev
Copy link
Contributor

as per observation from "torchvision/datasets/" below datasets need to be added,please update the pitch

LFW Labeled Faces in Wild
Market-1501 492 papers
MPII Human Pose
VGGFace2 Earlier requested in #1193 #2910 Here is tar.gz file. Hopefully we can add it
MovingMNIST Perviously approved in #2676 #2690.
iNaturalist #3292
LVIS

@pmeier
Copy link
Collaborator

pmeier commented May 3, 2021

Hey @harishsdev, not sure what you mean. From the original pitch only KITTI was added, which is correctly marked. In your list you left out CUB-200-2011, which is not supported yet. We do feature the Caltech(101|256) datasets, but they are not related other than coming from the same university.

@ABD-01
Copy link
Contributor

ABD-01 commented Aug 8, 2021

Hi @harishsdev, I have created a pr for LFW Dataset, can you guide me about any further changes.

@jgbradley1
Copy link
Contributor

The link provided for VGGFace2 is not correct; That link points to the first VGGFace dataset (which is available from this page).

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Aug 23, 2021

Actually the tar.gz is down for many months. Don't know what happened to VGG Face

https://www.robots.ox.ac.uk/~vgg/data/vgg_face2

Probably this is the link https://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz

@jgbradley1
Copy link
Contributor

Probably this is the link https://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz

Respectfully, that is the wrong url. The link you've provided is for the first version of VGGFace. The original pitch asked for VGGFace2, which will not be possible to provide at this time.

@yassineAlouini
Copy link
Contributor

@oke-aditya can we add the SmallNORB dataset to the list as introduced in this PR: #492. Thanks in advance. :)

@yassineAlouini
Copy link
Contributor

@oke-aditya Should we add the FGVC-Aircraft dataset (as implemented in this PR)?

@pmeier
Copy link
Collaborator

pmeier commented Jun 27, 2022

@yassineAlouini We already have FGVC-Aircraft in the current API

class FGVCAircraft(VisionDataset):

as well as #5354 to track progress for porting it to the prototype one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants