-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Currently all parameters must have names, i.e. they mast be accessible from the root module via an attribute chain (like root_mod.sub_mod.param). This has many advantages, such as:
- unique and straightforward name in the model checkpoint
- easy iteration over all params via
root_mod.parameters()orroot_mod.named_parameters() - easy param sharing (within the same module, or also across modules)
In some cases, anonymous parameters (not accessible via attrib chain from the root module) can be useful though:
-
It is required to allow a function like
nn.random_normalbecause this has internal state (i.e. a param holding the state). Otherwise you must always explicitly create random number generators as submodules viann.Random. -
Some people sometimes prefer to write modules in a more functional-style way, where you define functions like
self_attention_block, wherenn.Linearetc are only locally created.
So, what should we do? Be strict about this and never allow this?
Or if we allow it, some followup questions:
- Should these params be stored in the model checkpoint? How do we define the name?
- Should it be possible to iterate over the anonymous params? Via a separate function,
nn.global_paramsor so?
Maybe anonymous functions could automatically be added to the enclosing module (determined via scope) by some automatic naming scheme. Then root_module.parameters() would also cover them, and they would be stored in the checkpoint as well.