Skip to content

Dim tag description and identifier name inconsistent and not optimal  #119

@albertz

Description

@albertz

Some current examples:

    self.filter_size = [
      s if isinstance(s, nn.Dim) else nn.SpatialDim(f"filter-dim{i}", s)
      for i, s in enumerate(filter_size)]
    out_spatial_dims = _default_out_spatial_dims(
      description_prefix=nn.NameCtx.current_ctx().layer_abs_name_scope,
...
    out_spatial_dims = _default_out_spatial_dims(
      description_prefix=nn.NameCtx.current_ctx().get_abs_name(),
...
    out_spatial_dims = [
      nn.SpatialDim(f"{nn.NameCtx.current_ctx().layer_abs_name_scope}:out-spatial-dim{i}")
      for i, s in enumerate(self.filter_size)]
    if isinstance(num_heads, int):
      num_heads = nn.SpatialDim("num_heads", num_heads)
    expand_dim = nn.SpatialDim("self_att_expand_dim_init", 0)
    hist_dim = nn.SpatialDim(f"{nn.NameCtx.current_ctx().layer_abs_name_scope}:history")

These dim tags vary widely in their description which is inconsistent and not nice. This makes debugging also difficult.

Further, maybe even more importantly, the description is currently used to derive the Python identifier from for the Python serialization of the config. This leads then to sth like:

time_dim = SpatialDim('time')
input_dim = FeatureDim('input', 10)
_3_input_dim = 3 * input_dim
num_heads_dim = SpatialDim('num_heads', 2)
truediv_left_input__num_heads__dim = input_dim.div_left(num_heads_dim)
_3__truediv_left_input__num_heads___dim = 3 * truediv_left_input__num_heads__dim
encoder_layers_0_self_attn_history_dim = SpatialDim('encoder/layers/0/self_attn:history')
input_4_dim = input_dim * 4
encoder_layers_1_self_attn_history_dim = SpatialDim('encoder/layers/1/self_attn:history')
target_dim = FeatureDim('target', 7)
decoder_layers_0_self_attn_history_dim = SpatialDim('decoder/layers/0/self_attn:history')
decoder_layers_1_self_attn_history_dim = SpatialDim('decoder/layers/1/self_attn:history')
loop_dim = SpatialDim('loop-dim')

Or:

time_dim = SpatialDim('time')
input_dim = FeatureDim('input', 10)
dummy_input_feature_dim = FeatureDim('dummy-input-feature-dim', 1)
filter_dim0_dim = SpatialDim('filter-dim0', 3)
filter_dim1_dim = SpatialDim('filter-dim1', 3)
intermediate_out_sub_sample_dim = FeatureDim('intermediate_out_sub_sample', 14)
conv_subsample_layer_out_spatial_dim0_dim = time_dim.ceildiv_right(2)
conv_subsample_layer_out_spatial_dim1_dim = input_dim // 2
filter_dim0_0_dim = SpatialDim('filter-dim0', 3)
filter_dim1_0_dim = SpatialDim('filter-dim1', 3)
out_dim = FeatureDim('out', 14)
conv_subsample_layer_out_spatial_dim0_0_dim = conv_subsample_layer_out_spatial_dim0_dim.ceildiv_right(2)
conv_subsample_layer_out_spatial_dim1_0_dim = conv_subsample_layer_out_spatial_dim1_dim.ceildiv_right(2)
conv_subsample_layer_out_dim = SpatialDim('conv_subsample_layer:out_dim')
ff_dim = FeatureDim('ff', 17)
_3_out_dim = 3 * out_dim
num_heads_dim = SpatialDim('num_heads', 2)
truediv_left_out__num_heads__dim = out_dim.div_left(num_heads_dim)
_3__truediv_left_out__num_heads___dim = 3 * truediv_left_out__num_heads__dim
layers_0_self_att_history_dim = SpatialDim('layers/0/self_att:history')
_2_out_dim = 2 * out_dim
filter_dim0_1_dim = SpatialDim('filter-dim0', 32)
out__14_dim = out_dim // 14
layers_1_self_att_history_dim = SpatialDim('layers/1/self_att:history')

As long as we have not dealt with explicit hashing (#51), this is probably some code which will change in its logic (names, descriptions), which is a problem for Sisyphus hashing.

Some other issues:

  • Some of the dim tags descriptions (names) lack context where they are created.
  • There is no good way to have a consistent context due to the difference between a module __init__ (which is not a call) or a module __call__ or just a functional API (e.g. pool). Thus we have nn.NameCtx.current_ctx().layer_abs_name_scope and nn.NameCtx.current_ctx().get_abs_name() for some dim tag description prefixes.
  • The default description of dim tag arithmetic (e.g. 3 * input_dim) is just the expression itself, which is reasonable. However, we probably should overwrite this explicitly when it is used here to again add the context and meaning of it. E.g. here it is qkv for self-attention.
  • We could derive some description from the attribute name of a module, if it was assigned to a module. But this is not always the case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions