RelPosSelfAttention _rel_shift error, learned embedding

```
...
layer /encoder/layers/0/self_att/learned_pos_emb/'pos_emb': ['model/conformer_encoder/conformer_encoder_layer/rel_pos_self_attention/learned_relative_positional_encoding:learned-rel-pos'(33),F|F'truediv_left(enc, num_heads)'(64)] float32 
...
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'pad': [T|'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8),'1+(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B]] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'reshape': [T|'1+(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'slice': [T|'(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'reshape_0': [T|'model/encoder/input_layer:spatial'[B],'(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'slice_nd': [T|'model/encoder/input_layer:spatial'[B],'model/encoder/input_layer/pool1d/_pool_nd:out-spatial-dim0:kv'[B],B,F|'num_heads'(8)] float32 
...
epoch 300 search, step 9, max_size:data 95, mem_usage:GPU:0 1.2GB, num_seqs 200, 0.676 sec/step, elapsed 0:00:35
, exp. remaining 0:00:00, complete 100.00%
<ExternSprintDataset 'dataset_id22882323300752' epoch=300> add_new_data: seq=3945, len=162. Cache filled, waitin
g to get loaded...
TensorFlow exception: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/com
bined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/com
bined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape:
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)

Original stack trace for 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape':
  File "/setups/combined/2021-05-31/tools/returnn/rnn.py", line 11, in <module>
    main()
...
  File "/setups/combined/2021-05-31/tools/returnn/returnn/tf/network.py", line 1176, in _create_layer 
    layer = layer_class(**layer_desc)
  File "/setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py", line 4146, in __init__
    self.output.placeholder = tf.reshape(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper 
    return target(*args, **kwargs)
  File "/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 195, in reshape 
    result = gen_array_ops.reshape(tensor, shape, name) 
  File "/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8233, in reshape
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 742, in _apply_
op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3477, in _create_op_intern
al
    ret = Operation(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
    self._traceback = tf_stack.extract_stack()

Exception InvalidArgumentError() in step 10. (pid 7125)
Failing op: <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape' type=Reshape> 
We tried to fetch the op inputs ([<tf.Tensor 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose:0' shape=(?, ?, ?, 8) dtype=float32>, <tf.Tensor 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape/shape:0' shape=(4,) dtype=int32>]) but got another exception:
target_op <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape' type=Reshape>, 
ops
[<tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose' type=Transpose>, 
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/pad/Pad' type=Pad>,
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose/perm' type=Const value=[0 3 1 2]>,
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/pad/Pad/paddings' type=Const value=[[0 0]
 [0 0]
 [0 0]
 [1 0]]>,
 <tf.Operation 'encoder/layers/0/self_att/dot_0/Reshape_2' type=Reshape>,
 <tf.Operation 'encoder/layers/0/self_att/dot_0/Reshape_2/shape' type=Pack>,
...
 <tf.Operation 'encoder/input_layer/layers/0/bw/rec/rec/output_output_transpose/perm' type=Const value=[1 0 2]>,
 <tf.Operation 'extern_data/placeholders/data/SequenceMask/Const_1' type=Const value=0>,
 <tf.Operation 'extern_data/placeholders/data/SequenceMask/Const' type=Const value=[0]>]
EXCEPTION
Traceback (most recent call last):
  File "/u/zeyer/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1365, in BaseSession._do_call
...
InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[{{node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape}}]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[{{node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape}}]] 
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred: 

EXCEPTION
Traceback (most recent call last): 
  File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/engine.py", line 689, in Runner.run
    line: fetches_results = sess.run(
            fetches_dict, feed_dict=feed_dict)  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
...
InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304 
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304 
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation. 
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape: 
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)
 
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape: 
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)

Original stack trace for 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape':
  File "/setups/combined/2021-05-31/tools/returnn/rnn.py", line 11, in <module>
    main()
...
Step meta information:
{'seq_idx': [2000,
             2001,
             2002,
             2003,
             2004,
             2005,
             2006,
             2007,
             2008,
             2009,
             2010,
...
             2193,
             2194,
             2195],
 'seq_tag': ['rt03s/fsh_60549b/55',
             'rt03s/sw_45713a/26',
             'rt03s/fsh_60817a/15',
             'rt03s/fsh_61228b/22',
...
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(196)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 40) dtype=float32>: shape (196, 102, 40), dtype float32, min/max -4.856743/5.378608, mean/stddev 9.743296e-10/0.96528375, Data{'data', [B,T|'time'[B],F|F'audio'(40)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (196,), dtype int32, min/max 41/102, ([ 95  95  95  95  61  95  95  95  77  96  96  96  96  96  96  96  96  96
  96  96  96  96  96  96  96  96  85  96  96  89  96  96  86  96  96  96
  96  97  97  97  97  97  97  97  97  97  97  97  97  67  97  86  76  97
  94  97  97  97  41  97  97  66  97  97  97  97  97  97  97  97  97  97
  97  97  98  98  98  87  98  98  98  98  98  98  98  98  98  98  98  87
  59  98  97  98  98  94  98  98  98  88  87  92  98  98  98  98  98  91
  98  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99
  99  99  99  87  90  68  89  99  99  99  99  99  99 100  74 100 100 100
 100 100 100 100 100 100  92  65 100 100  92 100 100 100 100  75 100  94
 100 100 100 100 100 100 100 101  93 101 101 101 100  76  91 101 101 101
  98 101 101 101  76 101  99 101  73 101 101 101 101 102  96  78])
  <tf.Tensor 'extern_data/placeholders/seq_idx/seq_idx:0' shape=(?,) dtype=int32>: type <class 'list'>, Data{'seq_idx', [B?], dtype='int32'}
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Data{'seq_tag', [B?], dtype='string'}
...
```

Corresponding code in `_rel_shift`:
```python
    batch_dims = x.batch_dims_ordered((axis, pos_emb_spatial_dim))
    x_padded = nn.pad(x, axes=pos_emb_spatial_dim, padding=(1, 0), value=0.)  # [B,H,T,T*2]
    pos_emb_spatial_dim_ = 1 + pos_emb_spatial_dim

    x_padded = nn.reshape(x_padded, (axis, pos_emb_spatial_dim_), (pos_emb_spatial_dim_, axis))  # [B,H,T*2,T]
```

This happens with search.

This happens only later, after a lot of sequences are already recognized.

The sequences are ordered by length, from short to long, and this now seems to happen with quite long sequences.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RelPosSelfAttention _rel_shift error, learned embedding #238

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RelPosSelfAttention _rel_shift error, learned embedding #238

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions