-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add iNaturalist dataset #4123
Add iNaturalist dataset #4123
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dgenzel2 and thanks for the PR! While INaturalist is on the list of potential new datasets in #3562, I don't recall any decision on this. Did I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the quick PR Dmitriy!
@pmeier I've discussed with Dmitriy about working on this dataset as a good onboarding task. We've decided to only provide the labels for now, and not the bounding boxes.
I've done an initial pass and the PR looks good to me.
I made a few minor comments, but I'll leave @pmeier do a more thorough review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far. I got some comments inline. Plus, could you add the dataset to the documentation?
It turned out that the format for earlier years was different, so I had to make some changes. But now download is supported, and I verified it manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking image / video folders for integrity is not feasible, so we normally go another way: we skip the integrity check completely and bail out if we encounter already extracted folders together with download=True
:
vision/torchvision/datasets/kinetics.py
Lines 166 to 170 in a83b9a1
if path.exists(self.split_folder): | |
raise RuntimeError( | |
f"The directory {self.split_folder} already exists. " | |
f"If you want to re-download or re-extract the images, delete the directory." | |
) |
IMO we should adopt the same approach here, to avoid accidentally downloading again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking pretty good, thanks a lot Dmitriy!
I've left a minor comment that can be addressed in follow-up PRs. @pmeier I'm merging this PR, but let us know if you have further comments and we can address it in a follow-up PR
|
||
ADDITIONAL_CONFIGS = datasets_utils.combinations_grid( | ||
target_type=("kingdom", "full", "genus", ["kingdom", "phylum", "class", "order", "family", "genus", "full"]), | ||
version=("2021_train",), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future, it would be good to also test for the other years, as they contain different code-paths in the initialization phase
Failures are unrelated, merging |
Hey @fmassa! You merged this PR, but no labels were added. |
Summary: * Add iNaturalist dataset * Add download support * address comments Reviewed By: fmassa Differential Revision: D29659493 fbshipit-source-id: 9bdb53c24aeb6fdba9cf0604f1f824ed506d3c89 Co-authored-by: dgenzel <dgenzel@fb.com> Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
The torchvision iNaturalist dataset code does not allow to load the test split, e.g. 2017 or 2018 test split. What's the suggestion how to use the torchvision code when one also needs the test split? |
Unfortunately, there is none at the moment. We are working on revamping our datasets API after which all splits will be supported. But this is not ready yet. We could introduce the test splits on the current API by returning |
Adding iNaturalist dataset from https://github.com/visipedia/inat_comp
This relies on the data files only, not using annotations.
Resolves #3292