Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RelPosSelfAttention _rel_shift error, learned embedding #238

Closed
albertz opened this issue Nov 3, 2022 · 6 comments
Closed

RelPosSelfAttention _rel_shift error, learned embedding #238

albertz opened this issue Nov 3, 2022 · 6 comments

Comments

@albertz
Copy link
Member

albertz commented Nov 3, 2022

...
layer /encoder/layers/0/self_att/learned_pos_emb/'pos_emb': ['model/conformer_encoder/conformer_encoder_layer/rel_pos_self_attention/learned_relative_positional_encoding:learned-rel-pos'(33),F|F'truediv_left(enc, num_heads)'(64)] float32 
...
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'pad': [T|'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8),'1+(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B]] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'reshape': [T|'1+(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'slice': [T|'(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'reshape_0': [T|'model/encoder/input_layer:spatial'[B],'(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'slice_nd': [T|'model/encoder/input_layer:spatial'[B],'model/encoder/input_layer/pool1d/_pool_nd:out-spatial-dim0:kv'[B],B,F|'num_heads'(8)] float32 
...
epoch 300 search, step 9, max_size:data 95, mem_usage:GPU:0 1.2GB, num_seqs 200, 0.676 sec/step, elapsed 0:00:35
, exp. remaining 0:00:00, complete 100.00%
<ExternSprintDataset 'dataset_id22882323300752' epoch=300> add_new_data: seq=3945, len=162. Cache filled, waitin
g to get loaded...
TensorFlow exception: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/com
bined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/com
bined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape:
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)

Original stack trace for 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape':
  File "/setups/combined/2021-05-31/tools/returnn/rnn.py", line 11, in <module>
    main()
...
  File "/setups/combined/2021-05-31/tools/returnn/returnn/tf/network.py", line 1176, in _create_layer 
    layer = layer_class(**layer_desc)
  File "/setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py", line 4146, in __init__
    self.output.placeholder = tf.reshape(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper 
    return target(*args, **kwargs)
  File "/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 195, in reshape 
    result = gen_array_ops.reshape(tensor, shape, name) 
  File "/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8233, in reshape
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 742, in _apply_
op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3477, in _create_op_intern
al
    ret = Operation(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
    self._traceback = tf_stack.extract_stack()

Exception InvalidArgumentError() in step 10. (pid 7125)
Failing op: <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape' type=Reshape> 
We tried to fetch the op inputs ([<tf.Tensor 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose:0' shape=(?, ?, ?, 8) dtype=float32>, <tf.Tensor 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape/shape:0' shape=(4,) dtype=int32>]) but got another exception:
target_op <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape' type=Reshape>, 
ops
[<tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose' type=Transpose>, 
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/pad/Pad' type=Pad>,
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose/perm' type=Const value=[0 3 1 2]>,
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/pad/Pad/paddings' type=Const value=[[0 0]
 [0 0]
 [0 0]
 [1 0]]>,
 <tf.Operation 'encoder/layers/0/self_att/dot_0/Reshape_2' type=Reshape>,
 <tf.Operation 'encoder/layers/0/self_att/dot_0/Reshape_2/shape' type=Pack>,
...
 <tf.Operation 'encoder/input_layer/layers/0/bw/rec/rec/output_output_transpose/perm' type=Const value=[1 0 2]>,
 <tf.Operation 'extern_data/placeholders/data/SequenceMask/Const_1' type=Const value=0>,
 <tf.Operation 'extern_data/placeholders/data/SequenceMask/Const' type=Const value=[0]>]
EXCEPTION
Traceback (most recent call last):
  File "/u/zeyer/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1365, in BaseSession._do_call
...
InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[{{node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape}}]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[{{node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape}}]] 
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred: 

EXCEPTION
Traceback (most recent call last): 
  File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/engine.py", line 689, in Runner.run
    line: fetches_results = sess.run(
            fetches_dict, feed_dict=feed_dict)  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
...
InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304 
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304 
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation. 
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape: 
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)
 
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape: 
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)

Original stack trace for 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape':
  File "/setups/combined/2021-05-31/tools/returnn/rnn.py", line 11, in <module>
    main()
...
Step meta information:
{'seq_idx': [2000,
             2001,
             2002,
             2003,
             2004,
             2005,
             2006,
             2007,
             2008,
             2009,
             2010,
...
             2193,
             2194,
             2195],
 'seq_tag': ['rt03s/fsh_60549b/55',
             'rt03s/sw_45713a/26',
             'rt03s/fsh_60817a/15',
             'rt03s/fsh_61228b/22',
...
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(196)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 40) dtype=float32>: shape (196, 102, 40), dtype float32, min/max -4.856743/5.378608, mean/stddev 9.743296e-10/0.96528375, Data{'data', [B,T|'time'[B],F|F'audio'(40)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (196,), dtype int32, min/max 41/102, ([ 95  95  95  95  61  95  95  95  77  96  96  96  96  96  96  96  96  96
  96  96  96  96  96  96  96  96  85  96  96  89  96  96  86  96  96  96
  96  97  97  97  97  97  97  97  97  97  97  97  97  67  97  86  76  97
  94  97  97  97  41  97  97  66  97  97  97  97  97  97  97  97  97  97
  97  97  98  98  98  87  98  98  98  98  98  98  98  98  98  98  98  87
  59  98  97  98  98  94  98  98  98  88  87  92  98  98  98  98  98  91
  98  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99
  99  99  99  87  90  68  89  99  99  99  99  99  99 100  74 100 100 100
 100 100 100 100 100 100  92  65 100 100  92 100 100 100 100  75 100  94
 100 100 100 100 100 100 100 101  93 101 101 101 100  76  91 101 101 101
  98 101 101 101  76 101  99 101  73 101 101 101 101 102  96  78])
  <tf.Tensor 'extern_data/placeholders/seq_idx/seq_idx:0' shape=(?,) dtype=int32>: type <class 'list'>, Data{'seq_idx', [B?], dtype='int32'}
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Data{'seq_tag', [B?], dtype='string'}
...

Corresponding code in _rel_shift:

    batch_dims = x.batch_dims_ordered((axis, pos_emb_spatial_dim))
    x_padded = nn.pad(x, axes=pos_emb_spatial_dim, padding=(1, 0), value=0.)  # [B,H,T,T*2]
    pos_emb_spatial_dim_ = 1 + pos_emb_spatial_dim

    x_padded = nn.reshape(x_padded, (axis, pos_emb_spatial_dim_), (pos_emb_spatial_dim_, axis))  # [B,H,T*2,T]

This happens with search.

This happens only later, after a lot of sequences are already recognized.

The sequences are ordered by length, from short to long, and this now seems to happen with quite long sequences.

@albertz
Copy link
Member Author

albertz commented Nov 3, 2022

My first thought: This is probably related to some dim value, and then slice/gather on it. Maybe it lacks the beam or so. This is the encoder, there is no beam.

@albertz

This comment was marked as resolved.

@albertz
Copy link
Member Author

albertz commented Nov 4, 2022

A bit more meta: With all our logic for dim tags, which should actually make it easier to avoid any reshape problems or other shaping problems, why do we still frequently run into such things? The answer is probably too much unnecessary complexity and thus bugs in some parts. But which parts really? What can we remove from it? How can we improve this situation?

Related: rwth-i6/returnn#975

@albertz
Copy link
Member Author

albertz commented Nov 4, 2022

This test case triggers the bug:

def test_rel_pos_self_attention_learnable():
  class _Net(nn.Module):
    # noinspection PyShadowingNames
    def __init__(self, in_dim: nn.FeatureDim):
      super().__init__()
      self.self_att = nn.RelPosSelfAttention(
        in_dim=in_dim, proj_dim=nn.FeatureDim("out", 5),
        key_dim_total=nn.FeatureDim("key-dim-total", 21),
        value_dim_total=nn.FeatureDim("value-dim-total", 33),
        num_heads=3,
        # Shawn et al 2018 style, old RETURNN way.
        with_bias=False,
        with_linear_pos=False,
        with_pos_bias=False,
        learnable_pos_emb=True,
        learnable_pos_emb_clipping=3,
        separate_pos_emb_per_head=False,
      )

    def __call__(self, x: nn.Tensor, *, axis: nn.Dim) -> nn.Tensor:
      """forward"""
      return self.self_att(x, axis=axis)

  in_dim = nn.FeatureDim("in", 12)
  config, net_dict, net = dummy_config_net_dict(lambda: _Net(in_dim), with_axis=True, in_dim=in_dim)
  pprint(net_dict)
  dummy_run_net(config, net=net, seq_len=3)  # ok
  dummy_run_net(config, net=net, seq_len=3)  # try again, to see that running again is ok.
  dummy_run_net(config, net=net, seq_len=1)  # ok
  dummy_run_net(config, net=net, seq_len=4)  # problem currently...

Note that this test case also triggers some other unrelated bugs first, which are going to be fixed in rwth-i6/returnn#1199.

@albertz
Copy link
Member Author

albertz commented Nov 4, 2022

Fixed via 5e223b2.

@albertz albertz closed this as completed Nov 4, 2022
@albertz
Copy link
Member Author

albertz commented Nov 4, 2022

A bit more meta: With all our logic for dim tags, which should actually make it easier to avoid any reshape problems or other shaping problems, why do we still frequently run into such things? The answer is probably too much unnecessary complexity and thus bugs in some parts. But which parts really? What can we remove from it? How can we improve this situation?

Related: rwth-i6/returnn#975

To answer this question: The problem here was that we actually did manual dim math. In LearnedRelativePositionalEncoding, we had:

    out_spatial_dim = spatial_dim - 1 + spatial_dim

...
      remaining_dim = spatial_dim - self.clipping
...
      cond.true, out_spatial_dim_ = nn.concat(
        (left, remaining_dim),
        (self.pos_emb, self.clipped_spatial_dim),
        (right, remaining_dim))
      out_spatial_dim_.declare_same_as(out_spatial_dim)

And:

    self.clipped_spatial_dim = nn.SpatialDim(
      f"{nn.NameCtx.current_ctx().get_abs_name()}:learned-rel-pos",
      dimension=2 * clipping + 1)

I.e.:

out_spatial_dim_
  == spatial_dim - clipping + 2 * clipping + 1 + spatial_dim - clipping
  == 2 * spatial_dim + 1
  != 2 * spatial_dim - 1
  == out_spatial_dim

But the declare_same_as anyway just overwrote this.

Maybe in this case, we could have detected this statically. But in the general case, there are always cases where we can not detect this at compilation time, and only at runtime.

At some point, we planned to actually add such runtime checks for declare_same_as. Maybe this is a reminder. This would have allowed to much more easily detect this problem.

Edit I posted this here: rwth-i6/returnn#1200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant