paraformer onnx-gpu耗时过长原因定位 #1793

willnufe · 2024-06-07T03:37:37Z

我这边做了一些尝试，初步定位到paraformer onnx-gpu耗时过长的原因：

1. predictor中的cif部分

可将其替换成 https://github.com/George0828Zhang/torch_cif
（CIF的一种快速并行的实现方式，但没确认是否等价于paraformer内部的实现）

cudnn_conv_algo_search 的默认配置是 EXHAUSTIVE，这个选项的默认配置是比较耗时的，尤其影响卷积操作（通过打印的日志可以发现耗时的部分全部集中在 decoder 部分的Conv_kernel_time）

"dur" :52419,"ts" :4481356,"ph" : "X","name" :"/decoder/decoders.X/self_attn/fsmn_block/Conv_kernel_time"

The text was updated successfully, but these errors were encountered:

dtlzhuangz · 2024-06-07T06:06:00Z

我前两天刚实现了一版，我这边看着没什么问题，如果方便的话，也帮忙测一下 #1791

zylo117 · 2024-07-10T03:35:59Z

"cudnn_conv_algo_search": "DEFAULT" 非常重要，onnx gpu速度快了3、40倍

lanyuer · 2024-08-27T13:29:11Z

"cudnn_conv_algo_search": "DEFAULT" 非常重要，onnx gpu速度快了3、40倍

请教下，CPP版本也会这样吗

yuekaizhang · 2024-08-28T03:05:40Z

请教下，CPP版本也会这样吗

GPU 会受影响，CPU不会。cpp 和 python 是一样的

lanyuer · 2024-08-28T11:22:54Z

请教下，CPP版本也会这样吗

GPU 会受影响，CPU不会。cpp 和 python 是一样的

奇怪，我这边测试影响不大（cpp版本的），看文档中onnxruntime 14.0之后默认是这个取值了。不过GPU的速度还是明显没有达到预期，A10和32核的CPU加速比差不多。。。

willnufe added the bug Something isn't working label Jun 7, 2024