Skip to content

Conversation

@Callidior
Copy link
Contributor

@Callidior Callidior commented Nov 6, 2018

Previously, if BatchNormalization was initialized with BatchNormalization(freeze=False), its behaviour was not equivalent to the standard BatchNormalization layer, as one would expect. Instead, it was always forced to be in training mode, providing wrong validation results.

This PR does not change the behaviour for freeze=True, but makes the layer equivalent to the standard BatchNormalization layer from Keras for freeze=False.

Previously, if `BatchNormalization` was been initialized with `BatchNormalization(freeze=False)`, its behaviour was not equivalent to the standard `BatchNormalization` layer, as one would expect. Instead, it was always forced to be in training mode, providing wrong validation results.

This PR does not change the behaviour for `freeze=True`, but makes the layer equivalent to the standard `BatchNormalization` layer from Keras for `freeze=False`.
@Callidior Callidior changed the title Fix behaviour of unfrozen BatchNormalization layer Fix behaviour of unfrozen BatchNormalization layer (resolves #46) Nov 6, 2018
@hgaiser
Copy link
Contributor

hgaiser commented Nov 6, 2018

Doesn't this only change the behaviour if freeze=True?

Also, what accuracy are you getting now?

@Callidior
Copy link
Contributor Author

No, the behaviour for freeze=True is not changed. Previously, we called the method of the superclass with training=(not self.freeze), which would evaluate to training=False. Now, if self.freeze is True, we set training=False, as before.

If self.freeze is False, however, we now have training=None (the default) instead of training=True.

It's still training, but I now already have 12% validation accuracy after the first epoch and 40% after 4 epochs, which is already higher than anything I got without the modifications made in this PR.

@Callidior
Copy link
Contributor Author

Callidior commented Nov 7, 2018

By the way, I would question the example in the README. The model is initialized there with freeze_bn=True (the default), which fixes the BatchNormalization layers to test using the initialization parameters. This should be equivalent to using no batch normalization at all.

I also tried this first for my ImageNet training, since the README does so, but it didn't work.

@Callidior
Copy link
Contributor Author

I now finally obtained 68% validation accuracy, which is much closer to what I got with the bundled ResNet-50 than before.

@0x00b1
Copy link
Contributor

0x00b1 commented Nov 28, 2018

Awesome. Thanks, @Callidior.

@0x00b1 0x00b1 merged commit 7e2e67b into broadinstitute:master Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants