Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Registry refactor #1410

Merged
merged 6 commits into from
Jan 28, 2019
Merged

Registry refactor #1410

merged 6 commits into from
Jan 28, 2019

Conversation

jackd
Copy link
Contributor

@jackd jackd commented Jan 25, 2019

Re-write of utils.registry to a dict-like interface as per discussion here.

Common code pulled into dict-like Registry class. Old interface remains (registry.register_problem, registry.hparams etc), though this does mean there is a fair bit of aliasing (e.g. register_problem = problem_registry.register).

Removed registry registry (i.e. the registry containing registries) - I don't see why it was necessary. The point of registries (as far as I can tell) is to allow tensor2tensor to play nicely with external code through t2t_usr_dir. If external code wants to add a registry that tensor2tensor knows nothing about, then tensor2tensor won't be calling it, so it doesn't need to be registered... unless I'm missing something.

Marked a couple of function names deprecated, since they are inconsistent with naming convention of other functions in the old registry version. All aliased functions could potentially be deprecated, though I don't see much point.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@googlebot googlebot added the cla: no PR author has not signed CLA label Jan 25, 2019
@jackd jackd force-pushed the register_refactor branch 2 times, most recently from e798e7f to 24f32a5 Compare January 25, 2019 09:29
@googlebot
Copy link

CLAs look good, thanks!

@googlebot googlebot added cla: yes PR author has signed CLA and removed cla: no PR author has not signed CLA labels Jan 25, 2019
@jackd jackd mentioned this pull request Jan 25, 2019
@jackd
Copy link
Contributor Author

jackd commented Jan 25, 2019

Same tests failing on master (and as far as I can tell, unrelated to anything touched here).

@rsepassi
Copy link
Contributor

Excellent, thank you! At first glance, looks good! I’ll review later today.

@jackd
Copy link
Contributor Author

jackd commented Jan 25, 2019

One possible change would be to remove value_transformer from Registry so that all values are functions. The old API (which sometimes returns functions, and sometimes their values) could be maintained in the wrapper functions instead, i.e.

attack_registry = Registry("attack")

def attack(name):
  return attack_registry[name]()

This would make the new API more consistent (always return a function), though I'm not much of an API designer....

If we accept registries will only ever register callables, we could also put in some more specific error checks at the top level (or perhaps a subclass).

I'm also thinking putting all registries in a single object for namespacing might be nice, similar to metrics, rather than having multiple top-level variables with _registry suffix.

I'll be at a proper desk in a few hours - happy to make changes myself then/take into account further comments.

Copy link
Contributor

@rsepassi rsepassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for doing this! Just a few small changes and I think we're good to go.


def create_registry(registry_name):
"""Create a generic object registry.
This is the naming function by default for registers expecting classes or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/registers/registries/g

and in other places where you refer to register (as a noun) it should be registry

class Registry(object):
"""Dict-like class for managing registrations."""
def __init__(
self, register_name, default_key_fn=default_name, validator=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/register_name/name/g

value_transformer (optional): if run, `__getitem__` will return
value_transformer(key, registered_value).
"""
self._register = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._registry

if callback is not None:
callback(key, value)

def register(self, key=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add method docstrings here and in any other method that's > a few lines.
at least a 1-liner.
if > 1 line docstring, make sure it has an Args: section and a Returns: section

"Available optimizers:\n %s"
% (name, "\n".join(list_optimizers())))
return _OPTIMIZERS[name]
def _get(self, key):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm method. can use __getitem__ in the same way (e.g. model = model_registry.__getitem__)

if prefix:
return [name for name in _HPARAMS if name.startswith(prefix)]
return list(_HPARAMS)
def get(self, key, d=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/d/default

was_reversed: A boolean.
was_copy: A boolean.
"""
# Recursively strip tags until we reach a base name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the var names in this method to be consistent? e.g. was_rev vs was_reversed

model_registry = Registry("models", on_set=_on_model_set)
optimizer_registry = Registry(
"optimizers",
default_key_fn=lambda fn: misc_utils.snakecase_to_camelcase(fn.__name__),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's actually rm this. have snakecase as the standard, no special case for optimizers.


# consistent version of old API
model = model_registry._get
list_models = lambda: sorted(model_registry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rm sorted here and instead have __iter__ return sorted(self._registry). these can be lambda: list(model_registry)

register_problem = register_base_problem


def problem(problem_name, base_registry=base_problem_registry):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm base_registry argument. can just use base_problem_registry in the fn body

@rsepassi
Copy link
Contributor

I think the value transformer is nice, so let's keep that for now.
Yeah, putting them all in one object REGISTRIES = {} sounds good to me.
Maybe even a registry_registry? :P

return decorator(rhp_fn, registration_name=default_name(rhp_fn))
def _nargs_validator(nargs, message):
def f(key, value):
args, varargs, keywords, _ = inspect.getargspec(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use inspect.getfullargspec instead. looks like getargspec is deprecated

@jackd
Copy link
Contributor Author

jackd commented Jan 26, 2019

All changes implemented except sorted __iter__ comment above. It would need to be something like

def __iter__(self):
  return iter(sorted(self))

list_problems = lambda: list(Registries.problems)

Seems a bit weird to only expose an iterator to the sorted keys, rather than the sorted keys themselves. I suppose we could make keys return the sorted version, but I still feel it's a bit odd to force a less efficient implementation which is different to dicts on people just so list_problems etc. methods are slightly simpler. I don't fell list_problems = lambda: sorted(Registries.problems) is too bad. Membership test would be slower via keys, i.e. x in Registries.problems.keys(). If you want to make the change go ahead, I just need a bit more convincing...

I've updated optimizer naming convention to mostly snake case - the only exception being SGD and RMSProp, which have special rules (sgd and rms_prop). I've changed all instances in the code, and registry.optimizer still accepts legacy naming conventions (I'm thinking of all those logged hparams= flags).

Thanks for the feedback. For future reference: is this the standard work-flow for pull requests? Throw something up, get feedback, make tweaks? I'm very much a researcher used to working with very small groups... usually 1 (i.e. just me), or with someone I can shout to down the corridor... and while this hasn't been a painful experience by any stretch, if there's a nicer workflow then I'm only not using it out of ignorance rather than being "set in my ways".

@rsepassi
Copy link
Contributor

Fine with me. Thanks for these changes!

Yes, this is the standard workflow. Make a pull request, go through 1+ rounds of code review, merge. For things that are larger and/or require more discussion, we tend to write it up and/or talk synchronously (in-person, chat, or video chat). It's definitely different than small groups working on small codebases where everybody might be sitting right next to each other and everybody might have direct push access to the repo, but we've found it to be a good set of tradeoffs across scalability/friction/safety (every commit at Google goes through the same code review process).

Copy link
Contributor

@rsepassi rsepassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, sorry, looks like there was some usage of create_registry in the meantime (see tests). I think the line mtf_transformer2.py can be replaced with:

layers_registry = registry.Registry("layers")

@jackd
Copy link
Contributor Author

jackd commented Jan 27, 2019

I've moved layer_registry creation to Registries.layers for consistency. I envisage all tensor2tensor registries to be defined in registry, and registrations themselves happening in separate files. layers_registry = registry.Registry("layers") would work fine, but would expect users to know where to find it. If they tried creating their own Registry with the same name, they'd get a different object without any registrations.

I can see the argument for defining registries closer to where they are predominantly used (as in the layers_registry case) and updating the Registries rather than all in the same place (or updating a central Registries object in the same place), but I think consistency is more important.

@rsepassi rsepassi changed the title Register refactor Registry refactor Jan 28, 2019
@rsepassi rsepassi merged commit ac4cb05 into tensorflow:master Jan 28, 2019
@rsepassi
Copy link
Contributor

Thanks @jackd!

@jackd jackd deleted the register_refactor branch January 29, 2019 02:36
@jackd jackd restored the register_refactor branch January 29, 2019 02:39
@jackd
Copy link
Contributor Author

jackd commented Jan 29, 2019

... did something weird happen with the commits here? I see Merge branch 'master' into register_refactor follwed by merged commit xxx into tensorflow:master... and then there's this which has me thoroughly confused (I mean, it looks great, but not sure it has anything to do with me or registry changes). registry.py on tensorflow/tensor2tensor fork doesn't have registry changes either...

Sorry for the confusion, still getting my head around git...

@rsepassi
Copy link
Contributor

rsepassi commented Jan 29, 2019 via email

tensorflow-copybara pushed a commit that referenced this pull request Jan 29, 2019
PiperOrigin-RevId: 231486407
@rsepassi
Copy link
Contributor

Now actually permanently merged with this commit.

@jackd jackd deleted the register_refactor branch January 31, 2019 00:01
kpe pushed a commit to kpe/tensor2tensor that referenced this pull request Mar 2, 2019
* registry refactor and deprecated call-site updates

* added on_problem_set callback, simplified name

* changed optimizer registration names to snake_case, documentation

* removed create_registry
kpe pushed a commit to kpe/tensor2tensor that referenced this pull request Mar 2, 2019
PiperOrigin-RevId: 231486407
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes PR author has signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants