Closed
Description
...
layer /encoder/layers/0/self_att/learned_pos_emb/'pos_emb': ['model/conformer_encoder/conformer_encoder_layer/rel_pos_self_attention/learned_relative_positional_encoding:learned-rel-pos'(33),F|F'truediv_left(enc, num_heads)'(64)] float32
...
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'pad': [T|'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8),'1+(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B]] float32
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'reshape': [T|'1+(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8)] float32
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'slice': [T|'(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],'model/encoder/input_layer:spatial'[B],B,F|'num_heads'(8)] float32
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'reshape_0': [T|'model/encoder/input_layer:spatial'[B],'(model/encoder/input_layer:spatial)+-1+(model/encoder/input_layer:spatial)'[B],B,F|'num_heads'(8)] float32
layer /encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/'slice_nd': [T|'model/encoder/input_layer:spatial'[B],'model/encoder/input_layer/pool1d/_pool_nd:out-spatial-dim0:kv'[B],B,F|'num_heads'(8)] float32
...
epoch 300 search, step 9, max_size:data 95, mem_usage:GPU:0 1.2GB, num_seqs 200, 0.676 sec/step, elapsed 0:00:35
, exp. remaining 0:00:00, complete 100.00%
<ExternSprintDataset 'dataset_id22882323300752' epoch=300> add_new_data: seq=3945, len=162. Cache filled, waitin
g to get loaded...
TensorFlow exception: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
[[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/com
bined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
(1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
[[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/com
bined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
[[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape:
encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)
Original stack trace for 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape':
File "/setups/combined/2021-05-31/tools/returnn/rnn.py", line 11, in <module>
main()
...
File "/setups/combined/2021-05-31/tools/returnn/returnn/tf/network.py", line 1176, in _create_layer
layer = layer_class(**layer_desc)
File "/setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py", line 4146, in __init__
self.output.placeholder = tf.reshape(
File "/.local/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 195, in reshape
result = gen_array_ops.reshape(tensor, shape, name)
File "/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8233, in reshape
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 742, in _apply_
op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3477, in _create_op_intern
al
ret = Operation(
File "/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
self._traceback = tf_stack.extract_stack()
Exception InvalidArgumentError() in step 10. (pid 7125)
Failing op: <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape' type=Reshape>
We tried to fetch the op inputs ([<tf.Tensor 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose:0' shape=(?, ?, ?, 8) dtype=float32>, <tf.Tensor 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape/shape:0' shape=(4,) dtype=int32>]) but got another exception:
target_op <tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape' type=Reshape>,
ops
[<tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose' type=Transpose>,
<tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/pad/Pad' type=Pad>,
<tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose/perm' type=Const value=[0 3 1 2]>,
<tf.Operation 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/pad/Pad/paddings' type=Const value=[[0 0]
[0 0]
[0 0]
[1 0]]>,
<tf.Operation 'encoder/layers/0/self_att/dot_0/Reshape_2' type=Reshape>,
<tf.Operation 'encoder/layers/0/self_att/dot_0/Reshape_2/shape' type=Pack>,
...
<tf.Operation 'encoder/input_layer/layers/0/bw/rec/rec/output_output_transpose/perm' type=Const value=[1 0 2]>,
<tf.Operation 'extern_data/placeholders/data/SequenceMask/Const_1' type=Const value=0>,
<tf.Operation 'extern_data/placeholders/data/SequenceMask/Const' type=Const value=[0]>]
EXCEPTION
Traceback (most recent call last):
File "/u/zeyer/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1365, in BaseSession._do_call
...
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
[[{{node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape}}]]
(1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
[[{{node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape}}]]
[[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/engine.py", line 689, in Runner.run
line: fetches_results = sess.run(
fetches_dict, feed_dict=feed_dict) # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
...
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
[[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
(1) Invalid argument: Input to reshape is a tensor with 959616 values, but the requested shape has 906304
[[node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:4146) ]]
[[choice/search_resolve_loop/Exit_2/_2621]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape:
encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)
Input Source operations connected to node encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape:
encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/pad_output_transpose (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py:3360)
Original stack trace for 'encoder/layers/0/self_att/RelPosSelfAttention._rel_shift/reshape/Reshape':
File "/setups/combined/2021-05-31/tools/returnn/rnn.py", line 11, in <module>
main()
...
Step meta information:
{'seq_idx': [2000,
2001,
2002,
2003,
2004,
2005,
2006,
2007,
2008,
2009,
2010,
...
2193,
2194,
2195],
'seq_tag': ['rt03s/fsh_60549b/55',
'rt03s/sw_45713a/26',
'rt03s/fsh_60817a/15',
'rt03s/fsh_61228b/22',
...
Feed dict:
<tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(196)
<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 40) dtype=float32>: shape (196, 102, 40), dtype float32, min/max -4.856743/5.378608, mean/stddev 9.743296e-10/0.96528375, Data{'data', [B,T|'time'[B],F|F'audio'(40)]}
<tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (196,), dtype int32, min/max 41/102, ([ 95 95 95 95 61 95 95 95 77 96 96 96 96 96 96 96 96 96
96 96 96 96 96 96 96 96 85 96 96 89 96 96 86 96 96 96
96 97 97 97 97 97 97 97 97 97 97 97 97 67 97 86 76 97
94 97 97 97 41 97 97 66 97 97 97 97 97 97 97 97 97 97
97 97 98 98 98 87 98 98 98 98 98 98 98 98 98 98 98 87
59 98 97 98 98 94 98 98 98 88 87 92 98 98 98 98 98 91
98 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
99 99 99 87 90 68 89 99 99 99 99 99 99 100 74 100 100 100
100 100 100 100 100 100 92 65 100 100 92 100 100 100 100 75 100 94
100 100 100 100 100 100 100 101 93 101 101 101 100 76 91 101 101 101
98 101 101 101 76 101 99 101 73 101 101 101 101 102 96 78])
<tf.Tensor 'extern_data/placeholders/seq_idx/seq_idx:0' shape=(?,) dtype=int32>: type <class 'list'>, Data{'seq_idx', [B?], dtype='int32'}
<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Data{'seq_tag', [B?], dtype='string'}
...
Corresponding code in _rel_shift
:
batch_dims = x.batch_dims_ordered((axis, pos_emb_spatial_dim))
x_padded = nn.pad(x, axes=pos_emb_spatial_dim, padding=(1, 0), value=0.) # [B,H,T,T*2]
pos_emb_spatial_dim_ = 1 + pos_emb_spatial_dim
x_padded = nn.reshape(x_padded, (axis, pos_emb_spatial_dim_), (pos_emb_spatial_dim_, axis)) # [B,H,T*2,T]
This happens with search.
This happens only later, after a lot of sequences are already recognized.
The sequences are ordered by length, from short to long, and this now seems to happen with quite long sequences.
Metadata
Metadata
Assignees
Labels
No labels