Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discriminative Layer Training #969

Merged
merged 119 commits into from
Sep 14, 2020
Merged

Discriminative Layer Training #969

merged 119 commits into from
Sep 14, 2020

Conversation

hyang0129
Copy link
Contributor

See #958

I will continue to add more tests and setup distributed tests as well.

If you have any feedback, please let me know

@googlebot

This comment has been minimized.

@hyang0129
Copy link
Contributor Author

I think I forgot to run flake8 and black

@hyang0129
Copy link
Contributor Author

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@hyang0129
Copy link
Contributor Author

@WindQAQ please review

Copy link
Member

@WindQAQ WindQAQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Generally LGTM. Some opinions here:

from tensorflow.keras.optimizers import Optimizer

# python -m flake8 tensorflow_addons/optimizers/discriminative_layer_training.py
# python -m black tensorflow_addons/optimizers/discriminative_layer_training.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

class FakeVar:
def __init__(self, name):
# probably can be refactored out
self.name = name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason that we should use this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to alternative ideas. Its just an object to match how we access the name of weight object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't tf.Variable itself have name attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid serializing the whole variable as part of the optimizer, so I used the name. This has been refactored to remove fakevar. I tried using var.ref but it dies on deserialization

Copy link
Member

@WindQAQ WindQAQ Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it



@tf.keras.utils.register_keras_serializable(package="Addons")
class MultiOpt(Optimizer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a better name? At least call it MultiOptimizer instead of abbreviation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

)

self.initialized_optimizer_specs = [
self.initialize_from_optimizer_spec(spec) for spec in self.optimizer_specs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to do this get_config -> from_config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to prevent serialization issues when running on TPU. I don't want to pass instantiated optimizers for graph construction on the TPU. This should protect from weird problems on serialization and deserialization. As in, when we need to initialize the multiopt from a serialized state, we cannot pass instantiated optimizers to the multiopt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification 😃 It makes sense to me! Can we also have this in docstring? Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I rewrote some stuff that line is no longer needed after fixing the serialization

for spec in self.optimizer_specs:
spec["gv"] = []

for grad, var in tuple(grads_and_vars):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tuple is unnecessary right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it matches the parent class's usage of tuple

FakeVar(var.name) for sublayer in layer for var in sublayer.weights
]
else:
weights = [FakeVar(var.name) for var in layer.weights]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trainable_weights maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that would make a difference because the wrapper calls the apply function of the wrapped optimizers. I could try it, but I believe that the wrapped optimizers have their own way of handling trainable vs non trainable weights in their apply step. I'm guessing that a normal opt receives [weights] when running model.fit so either the model.fit only passes trainable weights or the opt.apply filters them out somehow. If that's the case, then it would be unnecessary to filter the weights twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimizer will not distinguish if the variable is trainable or not. It will create slots and compute gradient w.r.t. all variables passing to apply_gradients.

https://colab.research.google.com/drive/1NOigO7WuGdFKTKD4XovqD52SMrSYzI99?usp=sharing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe that's intended behavior when using the optimizer outside of the model.fit loop? When used in the model.fit loop, the multi opt correctly ignores layers without the trainable attribute.

I would prefer if the multi optimizer acts solely as a wrapper and also passes forward all bugs/features of the wrapped optimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link demonstrating layer trainable false works for multi optimizer here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It looks good to me if that won't cause any errors.


return {
"optimizer_class": optimizer_instance.__class__,
"optimizer_config": optimizer_instance.get_config(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use tf.keras.optimizers.serialize and tf.keras.optimizers.deserialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I found a less bad way to organize the wrapper. will push when testing is done

@@ -0,0 +1,101 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2020

if system == "Windows":
return [tf.float32, tf.float64]
else:
return _dtypes_to_test(use_gpu)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tf.linalg.svd in our test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? I'm not sure what that is TBH. I copied some code from another optimizer test so I can remove this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just remove it.

@WindQAQ
Copy link
Member

WindQAQ commented Sep 10, 2020

Also, I would like to have some tests on saving this optimizer

https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint

Probably we should use self._track_trackable(optimizer, name="xxx") to keep track on those optimizers.

@hyang0129
Copy link
Contributor Author

@WindQAQ I'll look into that for optimizer saving.

@hyang0129
Copy link
Contributor Author

@WindQAQ what kind of info would be helpful when calling print(multioptimizer)?

@WindQAQ
Copy link
Member

WindQAQ commented Sep 13, 2020

@WindQAQ what kind of info would be helpful when calling print(multioptimizer)?

I guess users may want to see the optimizers and their corresponding variables. This is optional, just depend on your choice if you also think it is useful.

@hyang0129
Copy link
Contributor Author

@WindQAQ should be good for review

Copy link
Member

@WindQAQ WindQAQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM! Some minor documentation and naming issue :-) Thank you again for bringing this up!

@typechecked
def __init__(
self,
optimizer_layer_pairs: Union[list, None] = None,
Copy link
Member

@WindQAQ WindQAQ Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think optimizers_and_layers much fit the tensorflow naming philosophy like apply_gradients(grads_and_vars).

**kwargs
):

"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r"""Short descriptions in the first line, do not break line so rendering can work correctly.

Detailed descriptions start here.

Args:
  foo: foo
  bar: bar

Usage:

>>> code snippet here
>>> code snippet here

Note:

Note here if any.

Reference:
  - [Name of the paper](linktothepaper)
"""

```


"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be put between class MultiOptimizer and def __init__. We also miss a section of Args. It's also encouraged to use doctest for python code snippet (optional, if you have time)

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/optimizer_v2/adam.py#L47-L69

import tensorflow as tf
from typeguard import typechecked
from typing import Union
from tensorflow.keras.optimizers import Optimizer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import Union

import tensorflow as tf
from tensorflow.keras.optimizers import Optimizer
from typeguard import typechecked


from tensorflow_addons.optimizers.discriminative_layer_training import MultiOptimzer
import tensorflow as tf
from tensorflow_addons.utils import test_utils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import pytest
import numpy as np
import tensorflow as tf

from tensorflow_addons.optimizers.discriminative_layer_training import MultiOptimzer
from tensorflow_addons.utils import test_utils

# The function "_DtypesToTest" is from
# "https://github.com/tensorflow/tensorflow/blob/5d4a6cee737a1dc6c20172a1dc1
# 5df10def2df72/tensorflow/python/kernel_tests/conv_ops_3d_test.py#L53-L62"
# TODO(WindQAQ): xxx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for misleading. can we change xxx to Clean up this in TF2.4? Thank you!



@tf.keras.utils.register_keras_serializable(package="Addons")
class MultiOptimzer(Optimizer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change this to tf.keras.optimizers.Optimizer since it only uses once.


@classmethod
def create_optimizer_spec(cls, optimizer_instance, layer):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check isinstance(optimizer, tf.keras.optimziers.Optimizer) here?

Wrapped Gradient Apply method. Returns a list of tf ops to be executed.

Name of variable is used rather than var.ref() to enable serialization and deserialization.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""Wrapped apply_gradient method.

Returns a list of tf ops to be executed.
Name of variable is used rather than var.ref() to enable serialization and deserialization.
"""

# expect weights to be same for layer 2
test_utils.assert_allclose_according_to_type(
weights_before_train[1], weights_after_train[1]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a serialization to the existing test as a parameter.

After compiling but before fitting, the test serializes the model with the optimizer, clears the session, then loads the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm I added a serialization test by itself as well

"You must specify either an list of optimizer_layer_pairs or a list of optimizer_specs"
)

def apply_gradients(self, grads_and_vars, name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be

    def apply_gradients(self, grads_and_vars, **kwargs):
        ...
        return tf.group(
            [
                spec["optimizer"].apply_gradients(spec["gv"], **kwargs)
                for spec in self.optimizer_specs
            ]
        )

to prevent upstream change.

@hyang0129
Copy link
Contributor Author

@WindQAQ I've made the changes

Copy link
Member

@WindQAQ WindQAQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry for the very long delay and thanks for the contribution to this good feature! Thank you!

@WindQAQ WindQAQ merged commit 3897e2f into tensorflow:master Sep 14, 2020
@hyang0129
Copy link
Contributor Author

LGTM! Sorry for the very long delay and thanks for the contribution to this good feature! Thank you!

I'm glad I could help! Thanks for the feedback to make this possible.

@WindQAQ
Copy link
Member

WindQAQ commented Sep 14, 2020

@hyang0129 Oops, We forgot to update code owners, would you mind filing another PR to update https://github.com/tensorflow/addons/blob/master/.github/CODEOWNERS ?

@WindQAQ
Copy link
Member

WindQAQ commented Sep 14, 2020

@hyang0129 Another thing is that MultiOptimzer -> MultiOptimizer.

@hyang0129
Copy link
Contributor Author

@hyang0129 Another thing is that MultiOptimzer -> MultiOptimizer.

I'm not sure what you mean by that? Could you elaborate?

@WindQAQ
Copy link
Member

WindQAQ commented Sep 14, 2020

@hyang0129 Another thing is that MultiOptimzer -> MultiOptimizer.

I'm not sure what you mean by that? Could you elaborate?

We misspell the word "Optimizer". We should correct it in next PR.

@hyang0129
Copy link
Contributor Author

I'll fix that

@seanpmorgan
Copy link
Member

@hyang0129 Thanks so much for your great contribution! Happy to see it merged, apologies for the delay.

@WindQAQ WindQAQ mentioned this pull request Sep 25, 2020
21 tasks
jrruijli pushed a commit to jrruijli/addons that referenced this pull request Dec 23, 2020
* initial setup. need to build tests

* build some tests. need to test them

* fixed typo

* created first test

* created first test

* accidentally messed up another file

* accidentally messed up another file

* accidentally messed up another file

* added run all distributed

* fixed formatting

* trying to fix tests not running on github CI.

* realized that I should probably add the new optimizer files to the build and init

* added typeguard and docstring

* removed run_all_distributed

* graph and eager testing for SGD

* reformatted

* added distributed tests

* removed distributed tests

* reverted discriminative layer grad adjust back to apply gradients

* added distributed tests with one time virtual device init

* increased tolerance for distributed
added comments explaining tests

* changed how distributed is recognized for increasing tolerance

* Redesigned Logic into Optimizer Wrapper (tensorflow#1)

* redesigned methodology to use multiple optimizers (one per unique LR) and pass grads to these multiple optimizers. Should allow for complex optimizers to behave properly

* adjusted behavior of resource apply to only return the op if the lr_mult matches the lr_mult of the optimizer
should only return 1 op for each var.

* updated init file
changed training config

* removed variable position and added some more comments

* removed grouped variables as unnecessary

* reformatted

* updated documentation
explicitly defined serialization as not supported

* added typecheck for name

* added typecheck for name

* fixed blank line at end of init file

* realized no new line meant to add new line
guessing that build file needs to be in alpha order?

* ran buildifier

* fixed accidentally affecting moving average

* changed print to logging.info

* changed print to logging.info

* Revert "changed print to logging.info"

This reverts commit 3fa5e19

* added tutorial.
tutorial doesn't import from tfa. May need to remove from PR.
Please let me know

* refactored to use static method
refactored to use getattr
updated warning on not using lr_mult
expanded on some docstrings

* updated the usage of lr_mult in variables

* renamed discriminative wrapper to disclayeropt

* added note to disuade directly calling apply_gradients

* updated toy_cnn to use tempdir and no longer call context.eager
implemented toy_rnn function with same flow as toycnn

* added toy_rnn and sgd to the test permutations

* refactored permutes and train results into private fns

* reformatted files and fixed flake 8 issues
fixed bad references when lr_mult was changed

* added missing functions in prep for tests

* updated assign lr mult and explained further why
refactored get lowest layers to assign sublayers
explained recursively assign sublayers better

* forgot to run black so ran it to reformat

* specified inputshape for rnn

* increased size of test
temporarily removed SGD opt. Double opts doubles the number of tests
to run so just need to see how long this one takes.

* remove toy rnn for now

* changed back to medium. maybe large was not actually increasing runtime

* fixed input layer

* fixed input layer being in wrong place

* virtual device modification issue

* fixed incorrect usage of lr_mult

* added comments for tests explaining them better
added toy rnn for testing

* added new test
fix toy rnn initialization

* fixed typo

* added inputshape so that pretrained rnn generates weights

* changed test to allow head to learn. it should move the loss better

* reformatted

* fixed test for variable assignment
added get config and from config

* reformatted

* fixed layer references from 1 to 0 because input layer isn't counted
as an actual layer in the layer list

* reformatted

* increased lr and epochs because learning was happning, but assertless
tolerance too low

* attempting to use run distributed from test utils

* removed tutorial

* switched to alternative distributed training method

* trying to use run distributed without graph and eager

* trying to use run_distributed

* seems that doing any tensorstuff before tf.test.main creates the issue. changed models to auto check if weights exist and create or load

* forgot to return a model on first run of model fn

* create model weights on init

* changed how args are passed for testcase

* changed how args are passed for testcase

* try fix init

* trying to init weights on model properly

* trying to init weights on model properly

* just trying all the possibilities

* trying to fix weights setup

* expanded some comments for some tests

* fixed some docstrings and expanded on some comments

* reformatted files

expanded on many comments and added full stops

fixed get/from_config based on optimzierv2

added model checkpoint test

* capitalized comments properly.

* removed sgd, reduced size of training inputs.

* simplified checkpoint name

* reformatted

* remove run tests in notebook

* updated README.md
fixed indent for __init__
added test for from config and to config

* fixed formatting

* removed distributed tests and added a warning if optimizer is initialized within a strategy scope

* renamed test_wrap to wrap_test bc pytest thought it was a test.

* converting tests into the pytest framework

* converted tests and parameterized

* cleaned up code

* added additional checks and doc string for changes in lr multiplier during training.

* changed comment

* Simplified discriminative layer training by using a multi optimizer wrapper class.

Removed old tests and added new tests conforming to pytest standard.

* Refactored code using black and flake8

* updated init file

* fixed typeguard error and usage of private/experimental api.

* restructured wrapper serialization and removed unnecessary components.

* expanded on docstr and added repr

* cleaned up docstrings, added assertion tests, and added explicit test for only the serialization

* ran black and flake8

* fixed doc string

Co-authored-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants