Skip to content
This repository has been archived by the owner on May 7, 2020. It is now read-only.

mnist and mnistm dataloaders #20

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

arikanev
Copy link

@arikanev arikanev commented Dec 25, 2017

Mnist segmentations are useful for quick training when testing models

@pep8speaks
Copy link

pep8speaks commented Dec 25, 2017

Hello @arikanev! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 15, 2018 at 16:55 Hours UTC

Copy link
Owner

@fvisin fvisin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR!
There are a few things to fix, but LGTM overall.
Once you fix these it can be merged! Thanks

import errno
from PIL import Image
from dataset_loaders.parallel_loader import ThreadedDataset
from timeit import default_timer as timer
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reorder the imports so that they are grouped according to https://www.python.org/dev/peps/pep-0008/#imports and each chunk is alphabetically ordered?

class MnistDataset(ThreadedDataset):
"""The mnist handwritten digit dataset

The dataset should be downloaded from [1] into the `shared_path`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace [1] with [MNIST]_.


References
----------
[1] Mnist dataset pickle file:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace [1] with [MNIST]

References
----------
[1] Mnist dataset pickle file:
your_username_here@elisa1.iro.umontreal.ca:/data/lisa/data/mnist/mnist_seg/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't make this MILA specific. Replace with the MNIST website or the download link.

# optional arguments
data_shape = (28, 28, 3)
mean = [0, 0, 0]
std = [1, 1, 1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to specify mean and std unless you actually compute them. Please either compute them or remove it.

max_files = 50000

GTclasses = range(2)
mapping_type = 'mnist'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never used. Please remove.

non_void_nclasses = 2
_void_labels = []
GTclasses = range(2)
# GTclasses = GTclasses + [-1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the commented line

your_username_here@elisa1.iro.umontreal.ca:/data/lisa/data/mnist/mnist_seg/

"""

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the empty line please

it also creates/copies the dataset in self.path if not already there
mnist data is in 3 directories train, test, valid)
"""

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the empty line please

from dataset_loaders.parallel_loader import ThreadedDataset


class MnistMDataset(ThreadedDataset):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please inherit from MnistDataset and modify only the relevant parts.

@arikanev
Copy link
Author

arikanev commented Jan 9, 2018

Implemented changes as requested, let me know if I should modify anything

Copy link
Owner

@fvisin fvisin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your new commit. There is still a lot of repeated code in the MnistmDataset class, please remove it. I also noticed a couple of new things that should be fixed.

When you are done, please also add the new datasets to the __init__.py in the root (follow what was done for the other datasets).

Thank you!


The dataset should be downloaded from [1] into the `shared_path`
The dataset should be downloaded from [MNIST] into the `shared_path`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"[MNIST]" should be "[MNIST]_"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

class MnistMDataset(MnistDataset):
"""The mnist-m handwritten digit dataset.

The dataset should be downloaded from [1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the link as done for the other file. Also please add a small description of the dataset.

Copy link
Author

@arikanev arikanev Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I want to add the download to mnistm via my website akanev.com. Should I wait to push changes until I link it, or should I push a change to just the mnistm download link later on? Or neither and link via google drive.

Copy link
Author

@arikanev arikanev Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will push changes now and then push that later

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any advantage to download through your website rather than the original download link? I think it's best to link to an established website to avoid to potentially generate confusion in the users and to make sure the link is up to date in case the original authors notice some problem with the data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no really established website for mnistm. I believe Yaroslav, (a MILA phd student), created it for previous research. There is one link to the old mnistm data via google drive, which I think could look sketchy. The new mnistm that I generated using a script is not available via the web right now, which is why I wanted to upload it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, theres some errors in the code when inheriting mnistm from mnist that I need to work out. So I will just push mnist for now, and fix the mnistm issues in the meantime.

}

_cmap = {
0: (0), # background
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmap should return RGB triples. Please fix this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. To confirm: (0, 0, 0)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

name = 'mnist'

# optional arguments
data_shape = (28, 28, 3)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe MNIST data_shape should be (28, 28). Can you verify this?

Copy link
Author

@arikanev arikanev Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes sorry.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

# optional arguments
data_shape = (28, 28, 3)

GTclasses = range(2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary, as is already defined in MnistDataset. The same goes for all the other properties that are not different between the two.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.


_filenames = None

def __init__(self, which_set='train', *args, **kwargs):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is the same as the one in MnistDataset. There is no need to redefine it.
The same goes with all the other methods: remove the parts that do not change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@arikanev
Copy link
Author

Removed mnistm.py, mnist.py should have all of the necessary changes you requested.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants