Skip to content

Conversation

@DerrickUnleashed
Copy link
Contributor

This PR Adds support to load the VGGFace2 Dataset
It also included proper documentation and tests for the dataset

Closes #224

@DerrickUnleashed
Copy link
Contributor Author

@cregouby This is what has been implemented

ds <- vggface2_dataset(download = TRUE)
item <- ds[1]
item$x      # image array
item$y      # integer label
ds$classes[item$y]  # list(name=..., gender=...)

We can also implement (not yet implemented) a method to give more details, like

black_hair 
brown_hair 
gray_hair 
blond_hair 
long_hair
mustache_or_beard
wearing_hat 
eyeglasses
sunglasses
mouth_open 

to the ds$classes but one major problem would be this data is available only for select pictures so it would have NA for most items in the dataset.

How can we proceed from here ?

@DerrickUnleashed DerrickUnleashed marked this pull request as ready for review August 27, 2025 09:57
@DerrickUnleashed
Copy link
Contributor Author

Sure @cregouby I'll make the necessary changes. I had exams so wasn't available for sometime. Will work on it asap.

Copy link
Collaborator

@cregouby cregouby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise thanks for switching to clear and readable resourse definition.
todo The dataset is for instance segmentation task and as such requires an instance segmentation output, not a classification output.


cli_inform("Downloading {.cls {class(self)[[1]]}}...")

for (i in seq_len(nrow(self$resources))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question Do we need to download 36GB of data (train-set) when the end-user request train=FALSE ?
suggestion I would save user time and disk space by limiting the download to the requested split


for (i in seq_len(nrow(self$resources))) {
row <- self$resources[i, ]
archive <- download_and_cache(row$url, prefix = row$split)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo could we prepend class(self)[[1]] to row$split for prefix= ? (this is to avoid 5 different files, hard to identify as being part of vggface2, spread in the root cache folder)

#' ds$classes[item$y] # list(name=..., gender=...)
#' }
#'
#' @family segmentation_dataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question this is (the first) instance segmentation dataset in the repo. Shall we mix it with object segmentation datasets @family segmentation_dataset or shall we define a new family ? Defining a new family would require an update of the website (see dataset categories in the _pkgdown.yml file) and a test/update for proper management by downstream functions (draw_segmentation_mask, ...)
suggestion let it like this but create an issue "vggface2 is an instance segmentation dataset" with a todo list on that.

expect_length(vgg, 169396)
first_item <- vgg[1]
expect_named(first_item, c("x", "y"))
expect_type(first_item$x, "double")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo missing as you know the first item, it would be nice to add a check on dim() of the first_item$x object. This gives a hit of image size to folks.

#'
#' @return A torch dataset object `vggface2_dataset`:
#' - `x`: RGB image array.
#' - `y`: Integer label (1…N) for the identity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo blocking y integer label is the output of a classification dataset, not an instance segmentation dataset. Either change one or the other

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo missing as there may be impact on draw_segmentation_mask could we add a check in one of the tests for drawing segmentation mask of the item ?

if (!is.null(self$target_transform)) {
y <- self$target_transform(y)
}
list(x = x, y = y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo please add the segmentation class to the output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Instance Segmentation Dataset] Add VGGFace2 Dataset

2 participants