Root module, naming logic API

I think the current naming logic is not straightforward or easy to understand. I don't mean the underlying implementation but the conceptual logic and the API as it is used by the user.

The user currently must explicitly create a new root `NameCtx` instance.
The "root module" is defined by passing `name=root_name_ctx` on its first call.

Let's take a step back. What is the naming logic relevant for?

- Final absolute variable names, of our model checkpoint, and also for param sharing (important)
- Layer names (nice to have consistent but not really so important)

The root module defines the absolute parameter names: They are required to be accessible via some attrib chain from the root module. So it is also important that there is only one root module, otherwise there might be conflicting names. There might still be ambiguities which are resolved by the the order of assignment.

So far, this is straightforward. This is also similar as how you would handle it in PyTorch when saving and loading a checkpoint. See #93.

`NameCtx` corresponds (mostly) to RETURNN layer names. This is why you assign a module call to a `NameCtx`, and not the module itself. (See the docstring of the `naming` module for further documentation.)

While it makes sense to keep the layer name hierarchy information (`NameCtx`) internally, and for other custom layer creation code it might also make sense to expose (absolute or relative) layer names, otherwise this probably should not be the main API for the naming logic. Also, even exposing the layer names could be problematic as there are some optimizations which still might move around layers, so the layer names are only really final at the end. And exposing the layer names as a `str` is also not really needed. It can just be a `nn.Tensor` instance and it will automatically derived lazily when constructing the final net dict. 

Related older similar issue: Better way to define main network dict (`make_root_net_dict`) (#44)

---

# Proposal

The public API in returnn-common maybe just should expose some function to define the root module. This is probably the only really needed API function. And this is only really needed at the time when we create the net dict.

Another related aspect is to have an explicit root name ctx. In most cases, like most configs or also interactive debugging, this is not really necessary, and it would be fine to just have one default root ctx. We could adopt a similar scheme as in TensorFlow which provides a default graph but you can also easily create your own.

The example from #44 could then look like:
```python
net = Model(...)

time_dim = nn.SpatialDim("time")
in_dim = nn.FeatureDim("input", ...)
data = nn.get_extern_data(nn.Data("data", dim_tags=[nn.batch_dim, time_dim, in_dim]))
out = net(data, axis=time_dim)
assert isinstance(out, nn.Tensor)
out.mark_as_default_output()

config_code = nn.get_returnn_config_serialized(net)
```

(For Sisyphus or other wrappers, it might make sense to split up `get_returnn_config_serialized` further into extern data and net dict.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Root module, naming logic API #125

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Root module, naming logic API #125

Description

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions