Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ImageDataGenerator behaviour fully seedable/repeatable #3751

Merged
merged 2 commits into from
Sep 22, 2016

Conversation

wassname
Copy link
Contributor

@wassname wassname commented Sep 12, 2016

This makes ImageDataGenerator fully seedable with minimal changes.

  • the seed argument in fit is now used
  • the seed argument in flow and flow_from_directory now affects
    transforms
  • added example to docs of transforming images and masks together
  • added test of using two seeded streams to get the same result

Related issue: 3059

Use cases:

  • Needing repeatable behaviour from ImageDataGenerator
  • Wanting to transform images and masks at the same time, e.g. kaggle ultrasound nerve segmentation challenge
  • Tests pass in python2.7 with tensorflow and theano and python3.4 with theano
  • pep8 passes
  • docs updated

This makes ImageDataGenerator fully seedable.
- the seed argument in fit is now used
- the seed argument in flow and flow_from_directory now effects
transforms
- added example to docs of transforming images and masks together
- added test of using two seeded streams at once
height_shift_range=0.1,
zoom_range=0.2
)
generator_i = ImageDataGenerator(**data_gen_args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use explicit variable names in examples (e.g. image_generator)


# We need to provide the same seed to the fit and flow methods
seed = 1
fake_y = np.zeroes(images.shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't need labels, you should configure the generator not to use labels (class_mode=None).


# combine the two generators into one which gives the first from each
def dual_gen(flow_images,flow_masks):
for [X,_],[y,_] in zip(flow_images,flow_masks):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, would be much more elegant without the fake labels.

# combine the two generators into one which gives the first from each
def dual_gen(flow_images,flow_masks):
for [X,_],[y,_] in zip(flow_images,flow_masks):
yield X,y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8

yield X,y

model.fit_generator(
dual_gen(flow_images,flow_masks),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8

@@ -90,6 +90,55 @@ def test_img_flip(self):
assert ((potentially_flipped_x == x).all() or
(potentially_flipped_x == flip_axis(x, col_index)).all())

def test_dual_generators(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test is useful; please remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My logic is that this test will break if any future PR break's repeatability, if you have time could you expand on why you don't think that's worth a test? In the mean time I'll remove it.

@wassname
Copy link
Contributor Author

Thanks for the feedback, I'll make those changes shortly

- PEP8
- explicit names
- classes=None
- remove test
@fchollet
Copy link
Collaborator

LGTM, thanks

@fchollet fchollet merged commit 414d5f0 into keras-team:master Sep 22, 2016
@yesufeng
Copy link

does this requires the transformations passed to ImageDataGenerator to be exactly the same for image and mask?

@wassname
Copy link
Contributor Author

wassname commented Mar 26, 2017

If our goal is to transform the image and mask together, then yes, and that's usually what we want. Usually we want the mask to correspond to the same pixels in the image even after transformation. Unless I'm missing something.

See the last example in the docs where they are transformed together.

@yesufeng
Copy link

yes, I saw that example. Let me give a concrete example, if the images are rescaled but not the mask, e.g., imageGen = ImageDataGenerator(rescale = 1/1000., cval = -1000., fill_mode = 'constant', rotation_range=180.)
maskGen = ImageDataGenerator(cval=0., fill_mode='constant', rotation_range=180.),
these two generators, when we call flow with the same seed and other keyword arguments, then zip them to a final generator, is it guaranteed that these two are synchronized?

@wassname
Copy link
Contributor Author

wassname commented Mar 27, 2017

In your example they wont be.

But instead of taking my word, you can easily test this, and other combinations yourself. Put the same image through both generators and compare them visually or with assert (a==b).all(). Example

Let's say I channel shift the image and not the mask. This change will propagate to other transformations, in this case the vertical flip, and the image and mask will not be synchronized. To me that was unexpected, but it's shown in the example. I expected it to be mostly synchronized but it wasn't

So to be clear, I don't expect them to ever be synchronized, unless the seed and transformation keywords are exactly the same. And only then.

@yesufeng
Copy link

thank you for the explanation! Yeah, feed in the same images will nail it down. It might be desirable to allow things such as rescale to be different for mask vs image without affecting the synchronized generator, for obvious reason.

@wassname
Copy link
Contributor Author

No worries. Yeah agreed.

wassname added a commit to wassname/keras that referenced this pull request Aug 12, 2017
Following keras-team#3751. I ran more tests to see if ImageDataGenerator generator can transform images and masks in the same way. It can for NumpyArrayIterator but not for DirectoryIterator. 

So I added seeds to fix this and now it can transform images and masks together. I have unit tests to prove this, and I can add them to the PR if desired?
wassname added a commit to wassname/keras that referenced this pull request Aug 12, 2017
Following keras-team#3751. I ran more tests to see if ImageDataGenerator generator can transform images and masks in the same way. It can for NumpyArrayIterator but not for DirectoryIterator. 

So I added seeds to fix this and now it can transform images and masks together. I have unit tests to prove this, and I can add them to the PR if desired?
wassname added a commit to wassname/keras that referenced this pull request Aug 12, 2017
Following keras-team#3751. I ran more tests to see if ImageDataGenerator generator can transform images and masks in the same way. It can for NumpyArrayIterator but not for DirectoryIterator. 

So I added seeds to fix this and now it can transform images and masks together. I have unit tests to prove this, and I can add them to the PR if desired?
@mositemp
Copy link

mositemp commented Apr 28, 2018

@yesufeng , @wassname , Did you made any progress in making image and mask generator synchronized when two different pre-processing pipelines are selected for each of them?
I running some experiments and it seems (in my case) that "channel_shift_range" option is the culprit here! Every time I enable it (even for both mask and image generators), generator outputs would not be synchronous.
I think this option is need to be fully synchronized and separable. There are cases that more than two generators are needed (e.g. two or more output mask or using pixel-wise weighting) where equal pre-processings are not suitable for all of theme.

@wassname
Copy link
Contributor Author

@mositemp I made a quick fix here but we decided not to merge it.

Good info on "channel_shift_range" thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants