Skip to content

elementwise_add_grad should be optimized  #7862

Closed
@wanghaoshuang

Description

@wanghaoshuang

elementwise_add_grad should be optimized to avoid the effect of eigen.

------------------------->     Profiling Report     <-------------------------

Place: CUDA
Time unit: ms
Sorted by total time in descending order in the same thread

Event                            Calls       Total       Min.        Max.        Ave.
thread0::elementwise_add_grad    176         7243.75     0.04736     168.678     41.1577
thread0::warpctc                 16          7202.06     1.28614     7180.67     450.129
thread0::conv2d_grad             128         752.047     2.50992     17.4375     5.87536
thread0::conv2d                  256         491.638     1.08138     4.74362     1.92046
thread0::batch_norm              256         301.695     0.076608    5.48499     1.17849
thread0::gru                     64          299.851     4.54563     5.25917     4.68517
thread0::batch_norm_grad         128         287.493     0.343488    6.05075     2.24604
thread0::gru_grad                32          192.225     5.83638     6.24419     6.00702
thread0::elementwise_add         576         89.5377     0.009984    0.579264    0.155447
thread0::relu                    256         57.8738     0.062848    0.482432    0.22607
thread0::mul                     128         52.3439     0.191296    0.6896      0.408937
thread0::relu_grad               128         40.9168     0.086784    0.685856    0.319662
thread0::pool2d_grad             64          30.8992     0.135456    0.989888    0.4828
thread0::pool2d                  128         24.5201     0.057664    0.394848    0.191563
thread0::mul_grad                64          21.6162     0.043584    0.628768    0.337753
thread0::momentum                688         17.9174     0.008928    0.317664    0.0260427
thread0::im2sequence             32          16.2841     0.504608    0.515488    0.508877
thread0::ctc_align               32          16.0577     0.469184    0.665344    0.501804
thread0::warpctc_grad            16          15.68       0.943936    1.09715     0.98
thread0::top_k                   32          9.99485     0.218304    1.05782     0.312339
thread0::im2sequence_grad        16          8.83674     0.54448     0.565408    0.552296
thread0::edit_distance           32          8.6968      0.244896    0.601024    0.271775
thread0::sum                     112         8.36666     0.019328    0.235168    0.0747023
thread0::scale                   256         7.41229     0.00928     0.1416      0.0289542
thread0::clip                    224         6.79677     0.009984    0.141408    0.0303427
thread0::cast                    36          6.31331     0.048736    0.229376    0.17537
thread0::feed                    64          4.81792     0.048608    0.194784    0.07528
thread0::fill_zeros_like         512         4.63562     0.00736     0.010624    0.00905394
thread0::fetch                   36          1.31088     0.024704    0.075424    0.0364133
thread0::reduce_sum              32          1.00794     0.026752    0.078976    0.031498
thread0::fill_constant           24          0.594816    0.017376    0.054784    0.024784
thread0::mean                    16          0.541664    0.027392    0.0888      0.033854
thread0::elementwise_div         4           0.285696    0.042336    0.099936    0.071424

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions