Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

global shuffling #14489

Open
grez72 opened this issue Mar 21, 2019 · 8 comments
Open

global shuffling #14489

grez72 opened this issue Mar 21, 2019 · 8 comments
Labels
Backend Issues related to the backend of MXNet Data-loading Feature request

Comments

@grez72
Copy link

grez72 commented Mar 21, 2019

I have mxnet version 1.4.0, and I'm using ImageRecordIter or ImageRecordIter_v1 and wondering if it is possible to globally shuffle images, rather than shuffling by batch? I'm training a model where the structure resulting from batch-wise shuffling actually dramatically impairs learning, so I need a way to globally shuffle.

Thanks!

@zachgk
Copy link
Contributor

zachgk commented Mar 22, 2019

Thank you for submitting the issue! I'm labeling it so the MXNet community members can help resolve it.

@mxnet-label-bot add [Feature request, Backend, Data-loading]

@marcoabreu marcoabreu added Backend Issues related to the backend of MXNet Data-loading Feature request labels Mar 22, 2019
@ptrendx
Copy link
Member

ptrendx commented Mar 22, 2019

Adding idx file to ImagerecordIter should result in globally shuffled images.

@grez72
Copy link
Author

grez72 commented Mar 22, 2019

Thanks for the quick response. I added the idx file, and I still don't seem to be getting a global shuffle.

Here's how I'm setting up the ImageRecordIter_v1 (I'm using v1 so that I can directly access the index. It seems ImageRecordIter returns only zeros for the index).

train_data = ImageRecordIter_v1(
    path_imglist    = path_imglist,
    path_imgrec     = rec_file,
    path_imgidx     = idx_file,
    data_shape      = (3, 224, 224),
    batch_size      = 32,
    shuffle_chunk_seed = 12,
    shuffle         = True,
    prefetch_buffer = 4,
    preprocess_threads = 8,
)

Does the global shuffle only work with ImageRecordIter? If so, is there any way to get ImageRecordIter to return the correct indices?

Thanks!

@ptrendx
Copy link
Member

ptrendx commented Mar 22, 2019

I'm not sure what do you mean by returning index. The v1 version of the ImageRecordIter is pretty old and deprecated, and I'm not even sure it does anything with the index file, let alone use it for global shuffling.

@grez72
Copy link
Author

grez72 commented Mar 22, 2019

Sorry I wasn't clear about that. By index I mean the value in the first column of the .lst file used to generate the .rec and .idx files. Each batch of the iterator has fields for batch.data, batch.label, and batch.index.

batch = data_iter.next()
data = batch.data[0]
labels = batch.label[0]
indexes = batch.index

When using ImageRecordIter_v1, batch.index returns the index of the image, corresponding to the first column of the .lst file. However, when using ImageRecordIter, batch.index returns all zeros!

The reason I was using ImageRecordIter_v1 was because it returns the correct index value, which I need to train my model (it also seemed to be loading images faster).

Is it possible to have ImageRecordIter return the correct image index (first column of .lst file)?

Thanks!

@ptrendx
Copy link
Member

ptrendx commented Mar 22, 2019

Hmmm, seems like a bug in ImageRecordIter then.

@grez72
Copy link
Author

grez72 commented Mar 25, 2019

I can confirm that ImageRecordIter only returns zeros for the index values. Should I submit a separate issue for this?

@ptrendx
Copy link
Member

ptrendx commented Mar 25, 2019

Please do. Did you confirm that you get global shuffling with ImageRecordIter with index provided?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet Data-loading Feature request
Projects
None yet
Development

No branches or pull requests

4 participants