[GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets #13

Inokinoki · 2023-12-22T15:54:05Z

Add an optional DataLoader300WLP dataloader

linear · 2023-12-22T15:54:08Z

GSK-2310 Create a Dataset300WLP using TensorFlow datasets

https://www.tensorflow.org/datasets/catalog/the300w_lp

import tensorflow_datasets as tfds

splits = tfds.load("the300w_lp")
ds = splits["train"]
datarows = ds.take(1)
for r in datarows:
  print(r["landmarks_2d"])
  print(r["image"])

Image format:shape=(1, 450, 450, 3), dtype=uint8

Landmark format:shape=(1, 68, 2), dtype=float32

…g-tensorflow-datasets

loreal_poc/dataloaders/loaders.py

rabah-khalek · 2023-12-22T17:15:55Z

loreal_poc/dataloaders/loaders.py

+
+            self.splits, self.info = tfds.load("the300w_lp", with_info=True)
+            self.split_name = "train"  # Only this one
+            self.ds = self.splits[self.split_name]


No need to put too much time on it but is there a way to pick an image from a specific dataset? The tensorflow page mention:

With 300W, 300W-LP adopt the proposed face profiling to generate 61,225 samples across large poses (1,786 from IBUG, 5,207 from AFW, 16,556 from LFPW and 37,676 from HELEN, XM2VTS is not used).

But probably the combine everything... it's not really a bottleneck, but would be nice.

Loading is composed by the following steps:

builder = tfds.builder('the300w_lp') # 1. Create the tfrecord files (no-op if already exists) builder.download_and_prepare() # 2. Load the `tf.data.Dataset` ds = builder.as_dataset(split='train', shuffle_files=True)

The file names are already dropped at step 1. Instead, the JPEG image data are integrated directly into a group of tfrecords binary files.

I tried to get the file name from the builder:

>>> g = builder._generate_examples("/Users/inoki/Downloads/tensorflow_datasets/downloads/extracted/ZIP.ucexport_download_id_0B7OEHD3T4eCkVGs0TkhUWFN6JQw2bEF61I9yUitin_g9uysqV5RYA61KUwppa7axPuc/300W_LP/") <generator object Builder._generate_examples at 0x2810035a0> >>> next(g) ('IBUG_image_100_03_1.jpg', {'image': '/Users/inoki/Downloads/tensorflow_datasets/downloads/extracted/ZIP.ucexport_download_id_0B7OEHD3T4eCkVGs0TkhUWFN6JQw2bEF61I9yUitin_g9uysqV5RYA61KUwppa7axPuc/300W_LP/IBUG/IBUG_image_100_03_1.jpg', 'landmarks_origin': array(...

It works but there is not an easy way to integrate with tfdsoperations.

I think that we can write a customized loader, to directly build the dataset from their ZIP file, with dataset name and flip as metadata. That way, we can drop dependency requirements of tensorflow and tfds. WDYT? @rabah-khalek

Their zip file url is hardcoded in the source code:

_DATASET_URL = "https://drive.google.com/uc?export=download&id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k"

loreal_poc/dataloaders/loaders.py

…g-tensorflow-datasets

…feature/gsk-2310-create-a-dataset300wlp-using-tensorflow-datasets

…g-tensorflow-datasets

rabah-khalek

We recently upgraded the property index_sampler to an abstract one. We need to know the len of the data, do you think it's possible to retrieve this information?

loreal_poc/dataloaders/loaders.py

…feature/gsk-2310-create-a-dataset300wlp-using-tensorflow-datasets

…g-tensorflow-datasets

Inokinoki · 2024-01-12T17:54:57Z

It's working on M2 Mac!

linear · 2024-01-12T18:23:40Z

GSK-2537 Fixing tensorflow-io-gcs-filesystem compatibility

rabah-khalek

great work @Inokinoki !

Inokinoki added 2 commits December 22, 2023 16:51

Add 300W_lp dataset in loaders

14f32ba

Add deps for 300w_lp dataset

1e40373

Inokinoki and others added 3 commits December 22, 2023 16:54

Merge branch 'main' into feature/gsk-2310-create-a-dataset300wlp-usin…

1fb267d

…g-tensorflow-datasets

updating 🔒 Lock

d080909

Merge branch 'main' into feature/gsk-2310-create-a-dataset300wlp-usin…

78f6278

…g-tensorflow-datasets

rabah-khalek suggested changes Dec 22, 2023

View reviewed changes

loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved

loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved

rabah-khalek reviewed Dec 22, 2023

View reviewed changes

Let constructor raise error

f0f7997

rabah-khalek reviewed Dec 22, 2023

View reviewed changes

loreal_poc/dataloaders/loaders.py Outdated Show resolved Hide resolved

rabah-khalek and others added 6 commits January 2, 2024 19:30

Merge branch 'main' into feature/gsk-2310-create-a-dataset300wlp-usin…

95783d1

…g-tensorflow-datasets

Merge branch 'main' into feature/gsk-2310-create-a-dataset300wlp-usin…

009722f

…g-tensorflow-datasets

Use a constant for dataset split

82ba613

Improve import error for 300WLP dataloader

5363681

Merge branch 'main' of https://github.com/Giskard-AI/loreal-poc into …

2c2f611

…feature/gsk-2310-create-a-dataset300wlp-using-tensorflow-datasets

Convert tf tensor to numpy array for image and landmarks

cf980b3

Inokinoki requested a review from rabah-khalek January 5, 2024 16:50

Inokinoki and others added 2 commits January 8, 2024 12:26

Allow to set tensorflow data dir

96c3783

Merge branch 'main' into feature/gsk-2310-create-a-dataset300wlp-usin…

8a5e19c

…g-tensorflow-datasets

rabah-khalek suggested changes Jan 9, 2024

View reviewed changes

Inokinoki and others added 7 commits January 10, 2024 17:22

Use self instead of class names

be53050

Merge branch 'main' of https://github.com/Giskard-AI/loreal-poc into …

9ecdbe0

…feature/gsk-2310-create-a-dataset300wlp-using-tensorflow-datasets

Update lockfile

35b22ad

added idx_sampler to dataloader 300lp

d48b086

updated pyproject

4653920

Use one-liner to get image and original marks

1969324

Merge branch 'main' into feature/gsk-2310-create-a-dataset300wlp-usin…

8eb8d52

…g-tensorflow-datasets

rabah-khalek added the Lockfile label Jan 12, 2024

Regenerating pdm.lock

483234c

github-actions bot removed the Lockfile label Jan 12, 2024

Fix tensorflow-io-gcs-filesystem version for ARM64 Mac

3b67dfd

Inokinoki added the Lockfile label Jan 12, 2024

Regenerating pdm.lock

4166193

github-actions bot removed the Lockfile label Jan 12, 2024

Inokinoki requested a review from rabah-khalek January 12, 2024 17:57

rabah-khalek added 3 commits January 12, 2024 19:08

improved GiskardImportError

8118120

polishing the DataLoader300WLP

383778c

pdm format

6fd0909

rabah-khalek changed the title ~~[GSK-2310] Create a dataset300wlp using tensorflow datasets~~ [GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets Jan 12, 2024

rabah-khalek approved these changes Jan 12, 2024

View reviewed changes

rabah-khalek enabled auto-merge (squash) January 12, 2024 18:25

rabah-khalek disabled auto-merge January 12, 2024 18:30

rabah-khalek merged commit 7b139bc into main Jan 12, 2024
5 checks passed

rabah-khalek deleted the feature/gsk-2310-create-a-dataset300wlp-using-tensorflow-datasets branch January 12, 2024 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets #13

[GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets #13

Inokinoki commented Dec 22, 2023

linear bot commented Dec 22, 2023

rabah-khalek Dec 22, 2023

Inokinoki Jan 5, 2024

rabah-khalek left a comment

Inokinoki commented Jan 12, 2024

linear bot commented Jan 12, 2024

rabah-khalek left a comment

[GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets #13

[GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets #13

Conversation

Inokinoki commented Dec 22, 2023

linear bot commented Dec 22, 2023

rabah-khalek Dec 22, 2023

Choose a reason for hiding this comment

Inokinoki Jan 5, 2024

Choose a reason for hiding this comment

rabah-khalek left a comment

Choose a reason for hiding this comment

Inokinoki commented Jan 12, 2024

linear bot commented Jan 12, 2024

rabah-khalek left a comment

Choose a reason for hiding this comment