-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I think the current naming logic is not straightforward or easy to understand. I don't mean the underlying implementation but the conceptual logic and the API as it is used by the user.
The user currently must explicitly create a new root NameCtx instance.
The "root module" is defined by passing name=root_name_ctx on its first call.
Let's take a step back. What is the naming logic relevant for?
- Final absolute variable names, of our model checkpoint, and also for param sharing (important)
- Layer names (nice to have consistent but not really so important)
The root module defines the absolute parameter names: They are required to be accessible via some attrib chain from the root module. So it is also important that there is only one root module, otherwise there might be conflicting names. There might still be ambiguities which are resolved by the the order of assignment.
So far, this is straightforward. This is also similar as how you would handle it in PyTorch when saving and loading a checkpoint. See #93.
NameCtx corresponds (mostly) to RETURNN layer names. This is why you assign a module call to a NameCtx, and not the module itself. (See the docstring of the naming module for further documentation.)
While it makes sense to keep the layer name hierarchy information (NameCtx) internally, and for other custom layer creation code it might also make sense to expose (absolute or relative) layer names, otherwise this probably should not be the main API for the naming logic. Also, even exposing the layer names could be problematic as there are some optimizations which still might move around layers, so the layer names are only really final at the end. And exposing the layer names as a str is also not really needed. It can just be a nn.Tensor instance and it will automatically derived lazily when constructing the final net dict.
Related older similar issue: Better way to define main network dict (make_root_net_dict) (#44)
Proposal
The public API in returnn-common maybe just should expose some function to define the root module. This is probably the only really needed API function. And this is only really needed at the time when we create the net dict.
Another related aspect is to have an explicit root name ctx. In most cases, like most configs or also interactive debugging, this is not really necessary, and it would be fine to just have one default root ctx. We could adopt a similar scheme as in TensorFlow which provides a default graph but you can also easily create your own.
The example from #44 could then look like:
net = Model(...)
time_dim = nn.SpatialDim("time")
in_dim = nn.FeatureDim("input", ...)
data = nn.get_extern_data(nn.Data("data", dim_tags=[nn.batch_dim, time_dim, in_dim]))
out = net(data, axis=time_dim)
assert isinstance(out, nn.Tensor)
out.mark_as_default_output()
config_code = nn.get_returnn_config_serialized(net)(For Sisyphus or other wrappers, it might make sense to split up get_returnn_config_serialized further into extern data and net dict.)