Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets #13

Conversation

Inokinoki
Copy link
Member

Add an optional DataLoader300WLP dataloader

Copy link

linear bot commented Dec 22, 2023

GSK-2310 Create a Dataset300WLP using TensorFlow datasets

linked to: GSK-2251

https://www.tensorflow.org/datasets/catalog/the300w_lp

import tensorflow_datasets as tfds

splits = tfds.load("the300w_lp")
ds = splits["train"]
datarows = ds.take(1)
for r in datarows:
  print(r["landmarks_2d"])
  print(r["image"])

Image format:shape=(1, 450, 450, 3), dtype=uint8

Landmark format:shape=(1, 68, 2), dtype=float32

loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved
loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved

self.splits, self.info = tfds.load("the300w_lp", with_info=True)
self.split_name = "train" # Only this one
self.ds = self.splits[self.split_name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to put too much time on it but is there a way to pick an image from a specific dataset? The tensorflow page mention:

With 300W, 300W-LP adopt the proposed face profiling to generate 61,225 samples across large poses (1,786 from IBUG, 5,207 from AFW, 16,556 from LFPW and 37,676 from HELEN, XM2VTS is not used).

But probably the combine everything... it's not really a bottleneck, but would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading is composed by the following steps:

builder = tfds.builder('the300w_lp')
# 1. Create the tfrecord files (no-op if already exists)
builder.download_and_prepare()
# 2. Load the `tf.data.Dataset`
ds = builder.as_dataset(split='train', shuffle_files=True)

The file names are already dropped at step 1. Instead, the JPEG image data are integrated directly into a group of tfrecords binary files.

I tried to get the file name from the builder:

>>> g = builder._generate_examples("/Users/inoki/Downloads/tensorflow_datasets/downloads/extracted/ZIP.ucexport_download_id_0B7OEHD3T4eCkVGs0TkhUWFN6JQw2bEF61I9yUitin_g9uysqV5RYA61KUwppa7axPuc/300W_LP/")
<generator object Builder._generate_examples at 0x2810035a0>
>>> next(g)
('IBUG_image_100_03_1.jpg', {'image': '/Users/inoki/Downloads/tensorflow_datasets/downloads/extracted/ZIP.ucexport_download_id_0B7OEHD3T4eCkVGs0TkhUWFN6JQw2bEF61I9yUitin_g9uysqV5RYA61KUwppa7axPuc/300W_LP/IBUG/IBUG_image_100_03_1.jpg', 'landmarks_origin': array(...

It works but there is not an easy way to integrate with tfdsoperations.

I think that we can write a customized loader, to directly build the dataset from their ZIP file, with dataset name and flip as metadata. That way, we can drop dependency requirements of tensorflow and tfds. WDYT? @rabah-khalek

Their zip file url is hardcoded in the source code:

_DATASET_URL = "https://drive.google.com/uc?export=download&id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k"

Copy link
Contributor

@rabah-khalek rabah-khalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently upgraded the property index_sampler to an abstract one. We need to know the len of the data, do you think it's possible to retrieve this information?

loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved
loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved
loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved
loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved
loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the Lockfile label Jan 12, 2024
@github-actions github-actions bot removed the Lockfile label Jan 12, 2024
@Inokinoki
Copy link
Member Author

It's working on M2 Mac!
image

@rabah-khalek rabah-khalek changed the title [GSK-2310] Create a dataset300wlp using tensorflow datasets [GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets Jan 12, 2024
Copy link

linear bot commented Jan 12, 2024

Copy link
Contributor

@rabah-khalek rabah-khalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @Inokinoki !

@rabah-khalek rabah-khalek enabled auto-merge (squash) January 12, 2024 18:25
@rabah-khalek rabah-khalek merged commit 7b139bc into main Jan 12, 2024
5 checks passed
@rabah-khalek rabah-khalek deleted the feature/gsk-2310-create-a-dataset300wlp-using-tensorflow-datasets branch January 12, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants