Closed
Description
Every single call to sequence like op, it will produce a bunch of kernel calls. It should be enhanced.
thread0::array_to_lod_tensor 2 62.4829 30.9197 31.5633 31.2415 3262.35 0 327.645
thread0::mul 763 61.21 0.038688 0.71024 0.0802228 642.303 0.000488281 25.7815
thread0::sequence_softmax_grad 69 57.409 0.04448 2.6696 0.832014 4320.91 0.000488281 0.0126953
thread0::lod_tensor_to_array 2 56.6311 27.8057 28.8253 28.3155 810.032 5.60864 327.656
Metadata
Metadata
Assignees
Labels
No labels