-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discriminative Layer Training #969
Conversation
This comment has been minimized.
This comment has been minimized.
I think I forgot to run flake8 and black |
@googlebot I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
@WindQAQ please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Generally LGTM. Some opinions here:
- We should elaborate doctoring that layers should be built before constructing this optimizer.
- Instead of holding
name
of variables, we can use.ref()
. https://www.tensorflow.org/api_docs/python/tf/Variable#ref - Can we override
__repr__
to give more information when printing it?
from tensorflow.keras.optimizers import Optimizer | ||
|
||
# python -m flake8 tensorflow_addons/optimizers/discriminative_layer_training.py | ||
# python -m black tensorflow_addons/optimizers/discriminative_layer_training.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
class FakeVar: | ||
def __init__(self, name): | ||
# probably can be refactored out | ||
self.name = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason that we should use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to alternative ideas. Its just an object to match how we access the name of weight object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't tf.Variable
itself have name
attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid serializing the whole variable as part of the optimizer, so I used the name. This has been refactored to remove fakevar. I tried using var.ref but it dies on deserialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
|
||
|
||
@tf.keras.utils.register_keras_serializable(package="Addons") | ||
class MultiOpt(Optimizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a better name? At least call it MultiOptimizer
instead of abbreviation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
) | ||
|
||
self.initialized_optimizer_specs = [ | ||
self.initialize_from_optimizer_spec(spec) for spec in self.optimizer_specs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to do this get_config -> from_config
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to prevent serialization issues when running on TPU. I don't want to pass instantiated optimizers for graph construction on the TPU. This should protect from weird problems on serialization and deserialization. As in, when we need to initialize the multiopt from a serialized state, we cannot pass instantiated optimizers to the multiopt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification 😃 It makes sense to me! Can we also have this in docstring? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I rewrote some stuff that line is no longer needed after fixing the serialization
for spec in self.optimizer_specs: | ||
spec["gv"] = [] | ||
|
||
for grad, var in tuple(grads_and_vars): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuple
is unnecessary right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it matches the parent class's usage of tuple
FakeVar(var.name) for sublayer in layer for var in sublayer.weights | ||
] | ||
else: | ||
weights = [FakeVar(var.name) for var in layer.weights] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trainable_weights
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that would make a difference because the wrapper calls the apply function of the wrapped optimizers. I could try it, but I believe that the wrapped optimizers have their own way of handling trainable vs non trainable weights in their apply step. I'm guessing that a normal opt receives [weights] when running model.fit so either the model.fit only passes trainable weights or the opt.apply filters them out somehow. If that's the case, then it would be unnecessary to filter the weights twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimizer will not distinguish if the variable is trainable or not. It will create slots and compute gradient w.r.t. all variables passing to apply_gradients
.
https://colab.research.google.com/drive/1NOigO7WuGdFKTKD4XovqD52SMrSYzI99?usp=sharing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe that's intended behavior when using the optimizer outside of the model.fit loop? When used in the model.fit loop, the multi opt correctly ignores layers without the trainable attribute.
I would prefer if the multi optimizer acts solely as a wrapper and also passes forward all bugs/features of the wrapped optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link demonstrating layer trainable false works for multi optimizer here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. It looks good to me if that won't cause any errors.
|
||
return { | ||
"optimizer_class": optimizer_instance.__class__, | ||
"optimizer_config": optimizer_instance.get_config(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use tf.keras.optimizers.serialize
and tf.keras.optimizers.deserialize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I found a less bad way to organize the wrapper. will push when testing is done
@@ -0,0 +1,101 @@ | |||
# Copyright 2019 The TensorFlow Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2020
tensorflow_addons/optimizers/tests/discriminative_layer_training_test.py
Show resolved
Hide resolved
if system == "Windows": | ||
return [tf.float32, tf.float64] | ||
else: | ||
return _dtypes_to_test(use_gpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have tf.linalg.svd
in our test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so? I'm not sure what that is TBH. I copied some code from another optimizer test so I can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, just remove it.
Also, I would like to have some tests on saving this optimizer https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint Probably we should use |
@WindQAQ I'll look into that for optimizer saving. |
@WindQAQ what kind of info would be helpful when calling print(multioptimizer)? |
I guess users may want to see the optimizers and their corresponding variables. This is optional, just depend on your choice if you also think it is useful. |
@WindQAQ should be good for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM! Some minor documentation and naming issue :-) Thank you again for bringing this up!
@typechecked | ||
def __init__( | ||
self, | ||
optimizer_layer_pairs: Union[list, None] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think optimizers_and_layers
much fit the tensorflow naming philosophy like apply_gradients(grads_and_vars)
.
**kwargs | ||
): | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r"""Short descriptions in the first line, do not break line so rendering can work correctly.
Detailed descriptions start here.
Args:
foo: foo
bar: bar
Usage:
>>> code snippet here
>>> code snippet here
Note:
Note here if any.
Reference:
- [Name of the paper](linktothepaper)
"""
``` | ||
|
||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be put between class MultiOptimizer
and def __init__
. We also miss a section of Args
. It's also encouraged to use doctest for python code snippet (optional, if you have time)
import tensorflow as tf | ||
from typeguard import typechecked | ||
from typing import Union | ||
from tensorflow.keras.optimizers import Optimizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from typing import Union
import tensorflow as tf
from tensorflow.keras.optimizers import Optimizer
from typeguard import typechecked
|
||
from tensorflow_addons.optimizers.discriminative_layer_training import MultiOptimzer | ||
import tensorflow as tf | ||
from tensorflow_addons.utils import test_utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import pytest
import numpy as np
import tensorflow as tf
from tensorflow_addons.optimizers.discriminative_layer_training import MultiOptimzer
from tensorflow_addons.utils import test_utils
# The function "_DtypesToTest" is from | ||
# "https://github.com/tensorflow/tensorflow/blob/5d4a6cee737a1dc6c20172a1dc1 | ||
# 5df10def2df72/tensorflow/python/kernel_tests/conv_ops_3d_test.py#L53-L62" | ||
# TODO(WindQAQ): xxx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for misleading. can we change xxx
to Clean up this in TF2.4
? Thank you!
|
||
|
||
@tf.keras.utils.register_keras_serializable(package="Addons") | ||
class MultiOptimzer(Optimizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change this to tf.keras.optimizers.Optimizer
since it only uses once.
|
||
@classmethod | ||
def create_optimizer_spec(cls, optimizer_instance, layer): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check isinstance(optimizer, tf.keras.optimziers.Optimizer)
here?
Wrapped Gradient Apply method. Returns a list of tf ops to be executed. | ||
|
||
Name of variable is used rather than var.ref() to enable serialization and deserialization. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Wrapped apply_gradient method.
Returns a list of tf ops to be executed.
Name of variable is used rather than var.ref() to enable serialization and deserialization.
"""
# expect weights to be same for layer 2 | ||
test_utils.assert_allclose_according_to_type( | ||
weights_before_train[1], weights_after_train[1] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also have a test about serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a serialization to the existing test as a parameter.
After compiling but before fitting, the test serializes the model with the optimizer, clears the session, then loads the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm I added a serialization test by itself as well
"You must specify either an list of optimizer_layer_pairs or a list of optimizer_specs" | ||
) | ||
|
||
def apply_gradients(self, grads_and_vars, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
def apply_gradients(self, grads_and_vars, **kwargs):
...
return tf.group(
[
spec["optimizer"].apply_gradients(spec["gv"], **kwargs)
for spec in self.optimizer_specs
]
)
to prevent upstream change.
… for only the serialization
@WindQAQ I've made the changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Sorry for the very long delay and thanks for the contribution to this good feature! Thank you!
I'm glad I could help! Thanks for the feedback to make this possible. |
@hyang0129 Oops, We forgot to update code owners, would you mind filing another PR to update https://github.com/tensorflow/addons/blob/master/.github/CODEOWNERS ? |
@hyang0129 Another thing is that |
I'm not sure what you mean by that? Could you elaborate? |
We misspell the word "Optimizer". We should correct it in next PR. |
I'll fix that |
@hyang0129 Thanks so much for your great contribution! Happy to see it merged, apologies for the delay. |
* initial setup. need to build tests * build some tests. need to test them * fixed typo * created first test * created first test * accidentally messed up another file * accidentally messed up another file * accidentally messed up another file * added run all distributed * fixed formatting * trying to fix tests not running on github CI. * realized that I should probably add the new optimizer files to the build and init * added typeguard and docstring * removed run_all_distributed * graph and eager testing for SGD * reformatted * added distributed tests * removed distributed tests * reverted discriminative layer grad adjust back to apply gradients * added distributed tests with one time virtual device init * increased tolerance for distributed added comments explaining tests * changed how distributed is recognized for increasing tolerance * Redesigned Logic into Optimizer Wrapper (tensorflow#1) * redesigned methodology to use multiple optimizers (one per unique LR) and pass grads to these multiple optimizers. Should allow for complex optimizers to behave properly * adjusted behavior of resource apply to only return the op if the lr_mult matches the lr_mult of the optimizer should only return 1 op for each var. * updated init file changed training config * removed variable position and added some more comments * removed grouped variables as unnecessary * reformatted * updated documentation explicitly defined serialization as not supported * added typecheck for name * added typecheck for name * fixed blank line at end of init file * realized no new line meant to add new line guessing that build file needs to be in alpha order? * ran buildifier * fixed accidentally affecting moving average * changed print to logging.info * changed print to logging.info * Revert "changed print to logging.info" This reverts commit 3fa5e19 * added tutorial. tutorial doesn't import from tfa. May need to remove from PR. Please let me know * refactored to use static method refactored to use getattr updated warning on not using lr_mult expanded on some docstrings * updated the usage of lr_mult in variables * renamed discriminative wrapper to disclayeropt * added note to disuade directly calling apply_gradients * updated toy_cnn to use tempdir and no longer call context.eager implemented toy_rnn function with same flow as toycnn * added toy_rnn and sgd to the test permutations * refactored permutes and train results into private fns * reformatted files and fixed flake 8 issues fixed bad references when lr_mult was changed * added missing functions in prep for tests * updated assign lr mult and explained further why refactored get lowest layers to assign sublayers explained recursively assign sublayers better * forgot to run black so ran it to reformat * specified inputshape for rnn * increased size of test temporarily removed SGD opt. Double opts doubles the number of tests to run so just need to see how long this one takes. * remove toy rnn for now * changed back to medium. maybe large was not actually increasing runtime * fixed input layer * fixed input layer being in wrong place * virtual device modification issue * fixed incorrect usage of lr_mult * added comments for tests explaining them better added toy rnn for testing * added new test fix toy rnn initialization * fixed typo * added inputshape so that pretrained rnn generates weights * changed test to allow head to learn. it should move the loss better * reformatted * fixed test for variable assignment added get config and from config * reformatted * fixed layer references from 1 to 0 because input layer isn't counted as an actual layer in the layer list * reformatted * increased lr and epochs because learning was happning, but assertless tolerance too low * attempting to use run distributed from test utils * removed tutorial * switched to alternative distributed training method * trying to use run distributed without graph and eager * trying to use run_distributed * seems that doing any tensorstuff before tf.test.main creates the issue. changed models to auto check if weights exist and create or load * forgot to return a model on first run of model fn * create model weights on init * changed how args are passed for testcase * changed how args are passed for testcase * try fix init * trying to init weights on model properly * trying to init weights on model properly * just trying all the possibilities * trying to fix weights setup * expanded some comments for some tests * fixed some docstrings and expanded on some comments * reformatted files expanded on many comments and added full stops fixed get/from_config based on optimzierv2 added model checkpoint test * capitalized comments properly. * removed sgd, reduced size of training inputs. * simplified checkpoint name * reformatted * remove run tests in notebook * updated README.md fixed indent for __init__ added test for from config and to config * fixed formatting * removed distributed tests and added a warning if optimizer is initialized within a strategy scope * renamed test_wrap to wrap_test bc pytest thought it was a test. * converting tests into the pytest framework * converted tests and parameterized * cleaned up code * added additional checks and doc string for changes in lr multiplier during training. * changed comment * Simplified discriminative layer training by using a multi optimizer wrapper class. Removed old tests and added new tests conforming to pytest standard. * Refactored code using black and flake8 * updated init file * fixed typeguard error and usage of private/experimental api. * restructured wrapper serialization and removed unnecessary components. * expanded on docstr and added repr * cleaned up docstrings, added assertion tests, and added explicit test for only the serialization * ran black and flake8 * fixed doc string Co-authored-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
See #958
I will continue to add more tests and setup distributed tests as well.
If you have any feedback, please let me know