-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dim
internals and API should be refactored
#975
Comments
Should fix #1069. Follow-up to #1067. Related is #975. * Dim doc cross refs to set_dyn_size_ext_for_batch_ctx * Data copy_template_unpack_batch, fix batch dim tag * _create_size_placeholder, fix tag batch info * _auto_create_size_placeholders, small fix batch dim validate * Dim _validate_in_current_graph, dyn_size_ext reset batch info * ExternData init_batch_info cleanup * test_ExternData_ext_Data_batch_info
Note, in case we try to address this, i.e. clean up or refactor some of this: We should also check and run the test cases of returnn-common, as there are some problems which might only occur via those tests. |
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
@Zettelkasten any thoughts or feedback on this? |
I wonder whether we should maybe avoid the whole
When we have the beam dim always separate and explicit, the first point would not be needed. At least with returnn-common, we probably want to have this anyway. For the second point, maybe we really want to treat them as separate dim tags, one variant inside the loop, and really a separate dim tag outside the loop. But this will need some logic to get from one variant of the dim tag to the other. So it again gets us to sth similar as Maybe the problem is also more that we treat all those variants returned by Or maybe the problem is that Maybe treating them as equal is also not so much the problem. What exactly is actually the problem? Sometimes we don't really know which is the right variant to pick in |
I think we should go through all the attribs and see which of those are really needed. One specific function I wonder is |
So, when looking for current use cases in RETURNN, what we have is:
Related, the
|
* This moves `Data` and `Dim` to a more central place `returnn.tensor` * Rename `Data` to `Tensor`. * Abstracts away from TF specific code, and allows for any raw tensor type. This is via the new frontends. * Separates the core of `Tensor` to additional attribs in `_TensorExtra` which is mostly only needed for legacy code, and most methods are in `_TensorMixin`. Similarly for `Dim`. To the user, these are implementation details and should not matter. For a new user interested in `Tensor`/`Dim`, it should be enough to just look the main classes. * Fix #1165. * Fix #1260. * Related: #975.
Note: Another somewhat common case of In the config, you define |
The logic involving
get_for_batch_ctx
andget_same_base
is ugly, unintuitive, and a potential source of problems.One big problem is that we never really defined the logic around
get_for_batch_ctx
well, and it was added only at a later point. And similarly for some of the other concepts. E.g. the equality of dim tags (#634) is also not well defined.CumConcatLayer
(#589) and generalized self-attention (#391) were one of the main reasons which introduced this, but then also the beam search logic.CumConcatLayer
also introduced the concept of implicit dims.Defining a dim tag (sequence lengths) inside a loop, and then having it accumulated outside is still somewhat straightforward and non-ambiguous, so this is not really a problem.
Maybe it was a problem to treat them as the same though? But I think this is important such that the rec loop optimization (move out of loop) works correctly.
Note also that the whole logic around
get_for_batch_ctx
is basically just for having differentdyn_size_ext
(dyn_size
) for the same dim tag, under different conditions, such as inside a loop or outside, and with beam or without.Direct assignments or reads from
dyn_size
ordyn_size_ext
, but also all the other flags, evendescription
, depends onget_for_batch_ctx
orget_same_base
.Some older code ignores
get_for_batch_ctx
orget_same_base
and directly accesses (reads or writes) any of the other flags likedyn_size
. Which works when it is the correct instance. But otherwise it can lead to unexpected behavior.Related is also
declare_same_as
, although this might not be much a problem, even less after such refactoring. However, its logic currently is quite complicated, and should be simplified.I'm also not sure about a good way to solve this. Maybe
dyn_size
anddyn_size_ext
should be hidden away, and only be accessible through functionsget_...
andset_...
, which would also require thebatch
andctx
.Another big problem is the special role of the batch dim (#920), and all the extra logic around
BatchInfo
, and whenData.batch
orDim.batch
should be set, and what it actually should be set to. In principle, we should not need any special logic for the batch dim, and it should be treated just like other dims.One feature which is available for the batch dim (
BatchInfo
) but not for other dims is the flattening logic, and also the special beam search logic.I think the flattening should be done in a way that you could combine multiple dims (but any dims) and you would get a new flattened dim. Nothing would be specific about the batch dim. There would be meta info attached to the combined dim to be able to recover the individual dims.
The beam should just be another separate dim, not merged into the batch dim. And then there could also be meta information attached to it, what we basically have in
SearchBeam
.Related is also the definition of dim tag equality (#634). This is still not well defined in all cases.
A bit related is also dim tag math, and esp questions on equality in those cases. However, I think this was not too much of a problem so far, except that the equality logic was also broken in its own ways in those cases.
Further, there is also the
derived_from_tag
andderived_from_op
logic, which is yet another heuristic for certain equality matching. Maybe this is not needed when dim tags are used everywhere consistently.And there is also
is_dim_known
,undefined
, which are also not so well defined.Such changes might break some older code. But on RETURNN side, this can all be fixed. And there should not be much (or any) external code yet using this.
Some examples of issues and resulting PRs caused due to the API of
Dim
, where things were not really well defined:CombineLayer
and others (Data.get_common_data
) should usebroadcast_matches=False
#666allow_same_spatial_dim=False
#865Related issues:
The text was updated successfully, but these errors were encountered: