-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Milestone
Description
Some current examples:
self.filter_size = [
s if isinstance(s, nn.Dim) else nn.SpatialDim(f"filter-dim{i}", s)
for i, s in enumerate(filter_size)]
out_spatial_dims = _default_out_spatial_dims(
description_prefix=nn.NameCtx.current_ctx().layer_abs_name_scope,
...
out_spatial_dims = _default_out_spatial_dims(
description_prefix=nn.NameCtx.current_ctx().get_abs_name(),
...
out_spatial_dims = [
nn.SpatialDim(f"{nn.NameCtx.current_ctx().layer_abs_name_scope}:out-spatial-dim{i}")
for i, s in enumerate(self.filter_size)]
if isinstance(num_heads, int):
num_heads = nn.SpatialDim("num_heads", num_heads)
expand_dim = nn.SpatialDim("self_att_expand_dim_init", 0)
hist_dim = nn.SpatialDim(f"{nn.NameCtx.current_ctx().layer_abs_name_scope}:history")
These dim tags vary widely in their description which is inconsistent and not nice. This makes debugging also difficult.
Further, maybe even more importantly, the description is currently used to derive the Python identifier from for the Python serialization of the config. This leads then to sth like:
time_dim = SpatialDim('time')
input_dim = FeatureDim('input', 10)
_3_input_dim = 3 * input_dim
num_heads_dim = SpatialDim('num_heads', 2)
truediv_left_input__num_heads__dim = input_dim.div_left(num_heads_dim)
_3__truediv_left_input__num_heads___dim = 3 * truediv_left_input__num_heads__dim
encoder_layers_0_self_attn_history_dim = SpatialDim('encoder/layers/0/self_attn:history')
input_4_dim = input_dim * 4
encoder_layers_1_self_attn_history_dim = SpatialDim('encoder/layers/1/self_attn:history')
target_dim = FeatureDim('target', 7)
decoder_layers_0_self_attn_history_dim = SpatialDim('decoder/layers/0/self_attn:history')
decoder_layers_1_self_attn_history_dim = SpatialDim('decoder/layers/1/self_attn:history')
loop_dim = SpatialDim('loop-dim')
Or:
time_dim = SpatialDim('time')
input_dim = FeatureDim('input', 10)
dummy_input_feature_dim = FeatureDim('dummy-input-feature-dim', 1)
filter_dim0_dim = SpatialDim('filter-dim0', 3)
filter_dim1_dim = SpatialDim('filter-dim1', 3)
intermediate_out_sub_sample_dim = FeatureDim('intermediate_out_sub_sample', 14)
conv_subsample_layer_out_spatial_dim0_dim = time_dim.ceildiv_right(2)
conv_subsample_layer_out_spatial_dim1_dim = input_dim // 2
filter_dim0_0_dim = SpatialDim('filter-dim0', 3)
filter_dim1_0_dim = SpatialDim('filter-dim1', 3)
out_dim = FeatureDim('out', 14)
conv_subsample_layer_out_spatial_dim0_0_dim = conv_subsample_layer_out_spatial_dim0_dim.ceildiv_right(2)
conv_subsample_layer_out_spatial_dim1_0_dim = conv_subsample_layer_out_spatial_dim1_dim.ceildiv_right(2)
conv_subsample_layer_out_dim = SpatialDim('conv_subsample_layer:out_dim')
ff_dim = FeatureDim('ff', 17)
_3_out_dim = 3 * out_dim
num_heads_dim = SpatialDim('num_heads', 2)
truediv_left_out__num_heads__dim = out_dim.div_left(num_heads_dim)
_3__truediv_left_out__num_heads___dim = 3 * truediv_left_out__num_heads__dim
layers_0_self_att_history_dim = SpatialDim('layers/0/self_att:history')
_2_out_dim = 2 * out_dim
filter_dim0_1_dim = SpatialDim('filter-dim0', 32)
out__14_dim = out_dim // 14
layers_1_self_att_history_dim = SpatialDim('layers/1/self_att:history')
As long as we have not dealt with explicit hashing (#51), this is probably some code which will change in its logic (names, descriptions), which is a problem for Sisyphus hashing.
Some other issues:
- Some of the dim tags descriptions (names) lack context where they are created.
- There is no good way to have a consistent context due to the difference between a module
__init__(which is not a call) or a module__call__or just a functional API (e.g.pool). Thus we havenn.NameCtx.current_ctx().layer_abs_name_scopeandnn.NameCtx.current_ctx().get_abs_name()for some dim tag description prefixes. - The default description of dim tag arithmetic (e.g.
3 * input_dim) is just the expression itself, which is reasonable. However, we probably should overwrite this explicitly when it is used here to again add the context and meaning of it. E.g. here it isqkvfor self-attention. - We could derive some description from the attribute name of a module, if it was assigned to a module. But this is not always the case.
Metadata
Metadata
Assignees
Labels
No labels