-
Notifications
You must be signed in to change notification settings - Fork 27
Add VGGFace2 Dataset #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add VGGFace2 Dataset #238
Conversation
|
@cregouby This is what has been implemented ds <- vggface2_dataset(download = TRUE)
item <- ds[1]
item$x # image array
item$y # integer label
ds$classes[item$y] # list(name=..., gender=...)We can also implement (not yet implemented) a method to give more details, like to the ds$classes but one major problem would be this data is available only for select pictures so it would have How can we proceed from here ? |
|
Sure @cregouby I'll make the necessary changes. I had exams so wasn't available for sometime. Will work on it asap. |
cregouby
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise thanks for switching to clear and readable resourse definition.
todo The dataset is for instance segmentation task and as such requires an instance segmentation output, not a classification output.
|
|
||
| cli_inform("Downloading {.cls {class(self)[[1]]}}...") | ||
|
|
||
| for (i in seq_len(nrow(self$resources))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question Do we need to download 36GB of data (train-set) when the end-user request train=FALSE ?
suggestion I would save user time and disk space by limiting the download to the requested split
|
|
||
| for (i in seq_len(nrow(self$resources))) { | ||
| row <- self$resources[i, ] | ||
| archive <- download_and_cache(row$url, prefix = row$split) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo could we prepend class(self)[[1]] to row$split for prefix= ? (this is to avoid 5 different files, hard to identify as being part of vggface2, spread in the root cache folder)
| #' ds$classes[item$y] # list(name=..., gender=...) | ||
| #' } | ||
| #' | ||
| #' @family segmentation_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question this is (the first) instance segmentation dataset in the repo. Shall we mix it with object segmentation datasets @family segmentation_dataset or shall we define a new family ? Defining a new family would require an update of the website (see dataset categories in the _pkgdown.yml file) and a test/update for proper management by downstream functions (draw_segmentation_mask, ...)
suggestion let it like this but create an issue "vggface2 is an instance segmentation dataset" with a todo list on that.
| expect_length(vgg, 169396) | ||
| first_item <- vgg[1] | ||
| expect_named(first_item, c("x", "y")) | ||
| expect_type(first_item$x, "double") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo missing as you know the first item, it would be nice to add a check on dim() of the first_item$x object. This gives a hit of image size to folks.
| #' | ||
| #' @return A torch dataset object `vggface2_dataset`: | ||
| #' - `x`: RGB image array. | ||
| #' - `y`: Integer label (1…N) for the identity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo blocking y integer label is the output of a classification dataset, not an instance segmentation dataset. Either change one or the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo missing as there may be impact on draw_segmentation_mask could we add a check in one of the tests for drawing segmentation mask of the item ?
| if (!is.null(self$target_transform)) { | ||
| y <- self$target_transform(y) | ||
| } | ||
| list(x = x, y = y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo please add the segmentation class to the output
This PR Adds support to load the VGGFace2 Dataset
It also included proper documentation and tests for the dataset
Closes #224