Skip to content

RelPosSelfAttention _rel_shift error, learned embedding #238

Closed
@albertz

Description

@albertz
...
layer /encoder/layers/0/self_att/learned_pos_emb/'pos_emb': ['model/conformer_encoder/conformer_encoder_layer/rel_pos_self_attention/learned_relative_positional_encoding:learned-rel-pos'(33),F|F'truediv_left(enc, num_heads)'(64)] float32 
...
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'pad': [T|'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8),'1+(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B]] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'reshape': [T|'1+(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'slice': [T|'(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'reshape_0': [T|'model/encoder/input_layer:spatial'[B],'(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],B,F|'num_heads'(8)] float32 
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'slice_nd': [T|'model/encoder/input_layer:spatial'[B],'model/encoder/input_layer/pool1d/_pool_nd:out-spatial-dim0:kv'[B],B,F|'num_heads'(8)] float32 
...
epoch 300 search, step 9, max_size:data 95, mem_usage:GPU:0 1.2GB, num_seqs 200, 0.676 sec/step, elapsed 0:00:35
, exp. remaining 0:00:00, complete 100.00%
<ExternSprintDataset 'dataset_id22882323300752' epoch=300> add_new_data: seq=3945, len=162. Cache filled, waitin
g to get loaded...
TensorFlow exception: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/com
bined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/com
bined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape:
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)

Original stack trace for 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape':
  File "/setups/combined/2021-05-31/tools/returnn/rnn.py", line 11, in <module>
    main()
...
  File "/setups/combined/2021-05-31/tools/returnn/returnn/tf/network.py", line 1176, in _create_layer 
    layer = layer_class(**layer_desc)
  File "/setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py", line 4146, in __init__
    self.output.placeholder = tf.reshape(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper 
    return target(*args, **kwargs)
  File "/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 195, in reshape 
    result = gen_array_ops.reshape(tensor, shape, name) 
  File "/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8233, in reshape
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 742, in _apply_
op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3477, in _create_op_intern
al
    ret = Operation(
  File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
    self._traceback = tf_stack.extract_stack()

Exception InvalidArgumentError() in step 10. (pid 7125)
Failing op: <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape' type=Reshape> 
We tried to fetch the op inputs ([<tf.Tensor 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose:0' shape=(?, ?, ?, 8) dtype=float32>, <tf.Tensor 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape/shape:0' shape=(4,) dtype=int32>]) but got another exception:
target_op <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape' type=Reshape>, 
ops
[<tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose' type=Transpose>, 
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/pad/Pad' type=Pad>,
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose/perm' type=Const value=[0 3 1 2]>,
 <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/pad/Pad/paddings' type=Const value=[[0 0]
 [0 0]
 [0 0]
 [1 0]]>,
 <tf.Operation 'encoder/layers/0/self_att/dot_0/Reshape_2' type=Reshape>,
 <tf.Operation 'encoder/layers/0/self_att/dot_0/Reshape_2/shape' type=Pack>,
...
 <tf.Operation 'encoder/input_layer/layers/0/bw/rec/rec/output_output_transpose/perm' type=Const value=[1 0 2]>,
 <tf.Operation 'extern_data/placeholders/data/SequenceMask/Const_1' type=Const value=0>,
 <tf.Operation 'extern_data/placeholders/data/SequenceMask/Const' type=Const value=[0]>]
EXCEPTION
Traceback (most recent call last):
  File "/u/zeyer/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1365, in BaseSession._do_call
...
InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[{{node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape}}]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
         [[{{node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape}}]] 
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred: 

EXCEPTION
Traceback (most recent call last): 
  File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/engine.py", line 689, in Runner.run
    line: fetches_results = sess.run(
            fetches_dict, feed_dict=feed_dict)  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
...
InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304 
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
  (1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304 
         [[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
         [[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation. 
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape: 
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)
 
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape: 
 encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)

Original stack trace for 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape':
  File "/setups/combined/2021-05-31/tools/returnn/rnn.py", line 11, in <module>
    main()
...
Step meta information:
{'seq_idx': [2000,
             2001,
             2002,
             2003,
             2004,
             2005,
             2006,
             2007,
             2008,
             2009,
             2010,
...
             2193,
             2194,
             2195],
 'seq_tag': ['rt03s/fsh_60549b/55',
             'rt03s/sw_45713a/26',
             'rt03s/fsh_60817a/15',
             'rt03s/fsh_61228b/22',
...
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(196)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 40) dtype=float32>: shape (196, 102, 40), dtype float32, min/max -4.856743/5.378608, mean/stddev 9.743296e-10/0.96528375, Data{'data', [B,T|'time'[B],F|F'audio'(40)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (196,), dtype int32, min/max 41/102, ([ 95  95  95  95  61  95  95  95  77  96  96  96  96  96  96  96  96  96
  96  96  96  96  96  96  96  96  85  96  96  89  96  96  86  96  96  96
  96  97  97  97  97  97  97  97  97  97  97  97  97  67  97  86  76  97
  94  97  97  97  41  97  97  66  97  97  97  97  97  97  97  97  97  97
  97  97  98  98  98  87  98  98  98  98  98  98  98  98  98  98  98  87
  59  98  97  98  98  94  98  98  98  88  87  92  98  98  98  98  98  91
  98  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99  99
  99  99  99  87  90  68  89  99  99  99  99  99  99 100  74 100 100 100
 100 100 100 100 100 100  92  65 100 100  92 100 100 100 100  75 100  94
 100 100 100 100 100 100 100 101  93 101 101 101 100  76  91 101 101 101
  98 101 101 101  76 101  99 101  73 101 101 101 101 102  96  78])
  <tf.Tensor 'extern_data/placeholders/seq_idx/seq_idx:0' shape=(?,) dtype=int32>: type <class 'list'>, Data{'seq_idx', [B?], dtype='int32'}
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Data{'seq_tag', [B?], dtype='string'}
...

Corresponding code in _rel_shift:

    batch_dims = x.batch_dims_ordered((axis, pos_emb_spatial_dim))
    x_padded = nn.pad(x, axes=pos_emb_spatial_dim, padding=(1, 0), value=0.)  # [B,H,T,T*2]
    pos_emb_spatial_dim_ = 1 + pos_emb_spatial_dim

    x_padded = nn.reshape(x_padded, (axis, pos_emb_spatial_dim_), (pos_emb_spatial_dim_, axis))  # [B,H,T*2,T]

This happens with search.

This happens only later, after a lot of sequences are already recognized.

The sequences are ordered by length, from short to long, and this now seems to happen with quite long sequences.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions