-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets #13
[GSK-2310, GSK-2537] Create a dataset300wlp using tensorflow datasets #13
Conversation
GSK-2310 Create a Dataset300WLP using TensorFlow datasets
linked to: GSK-2251 https://www.tensorflow.org/datasets/catalog/the300w_lp
Image format: Landmark format: |
…g-tensorflow-datasets
…g-tensorflow-datasets
loreal_poc/dataloaders/loaders.py
Outdated
|
||
self.splits, self.info = tfds.load("the300w_lp", with_info=True) | ||
self.split_name = "train" # Only this one | ||
self.ds = self.splits[self.split_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to put too much time on it but is there a way to pick an image from a specific dataset? The tensorflow page mention:
With 300W, 300W-LP adopt the proposed face profiling to generate 61,225 samples across large poses (1,786 from IBUG, 5,207 from AFW, 16,556 from LFPW and 37,676 from HELEN, XM2VTS is not used).
But probably the combine everything... it's not really a bottleneck, but would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading is composed by the following steps:
builder = tfds.builder('the300w_lp')
# 1. Create the tfrecord files (no-op if already exists)
builder.download_and_prepare()
# 2. Load the `tf.data.Dataset`
ds = builder.as_dataset(split='train', shuffle_files=True)
The file names are already dropped at step 1. Instead, the JPEG image data are integrated directly into a group of tfrecords
binary files.
I tried to get the file name from the builder:
>>> g = builder._generate_examples("/Users/inoki/Downloads/tensorflow_datasets/downloads/extracted/ZIP.ucexport_download_id_0B7OEHD3T4eCkVGs0TkhUWFN6JQw2bEF61I9yUitin_g9uysqV5RYA61KUwppa7axPuc/300W_LP/")
<generator object Builder._generate_examples at 0x2810035a0>
>>> next(g)
('IBUG_image_100_03_1.jpg', {'image': '/Users/inoki/Downloads/tensorflow_datasets/downloads/extracted/ZIP.ucexport_download_id_0B7OEHD3T4eCkVGs0TkhUWFN6JQw2bEF61I9yUitin_g9uysqV5RYA61KUwppa7axPuc/300W_LP/IBUG/IBUG_image_100_03_1.jpg', 'landmarks_origin': array(...
It works but there is not an easy way to integrate with tfds
operations.
I think that we can write a customized loader, to directly build the dataset from their ZIP file, with dataset name
and flip
as metadata. That way, we can drop dependency requirements of tensorflow and tfds. WDYT? @rabah-khalek
Their zip file url is hardcoded in the source code:
_DATASET_URL = "https://drive.google.com/uc?export=download&id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k"
…g-tensorflow-datasets
…g-tensorflow-datasets
…feature/gsk-2310-create-a-dataset300wlp-using-tensorflow-datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently upgraded the property index_sampler
to an abstract one. We need to know the len of the data, do you think it's possible to retrieve this information?
…feature/gsk-2310-create-a-dataset300wlp-using-tensorflow-datasets
…g-tensorflow-datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work @Inokinoki !
Add an optional
DataLoader300WLP
dataloader