We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我这边做了一些尝试,初步定位到paraformer onnx-gpu耗时过长的原因:
可将其替换成 https://github.com/George0828Zhang/torch_cif (CIF的一种快速并行的实现方式,但没确认是否等价于paraformer内部的实现)
cudnn_conv_algo_search 的默认配置是 EXHAUSTIVE,这个选项的默认配置是比较耗时的,尤其影响卷积操作(通过打印的日志可以发现耗时的部分全部集中在 decoder 部分的Conv_kernel_time)
"dur" :52419,"ts" :4481356,"ph" : "X","name" :"/decoder/decoders.X/self_attn/fsmn_block/Conv_kernel_time"
providers = [ ( "CUDAExecutionProvider", {"cudnn_conv_algo_search": "DEFAULT"} ), 'CPUExecutionProvider' ]
The text was updated successfully, but these errors were encountered:
我前两天刚实现了一版,我这边看着没什么问题,如果方便的话,也帮忙测一下 #1791
Sorry, something went wrong.
"cudnn_conv_algo_search": "DEFAULT" 非常重要,onnx gpu速度快了3、40倍
"cudnn_conv_algo_search": "DEFAULT"
请教下,CPP版本也会这样吗
GPU 会受影响,CPU不会。cpp 和 python 是一样的
请教下,CPP版本也会这样吗 GPU 会受影响,CPU不会。cpp 和 python 是一样的
奇怪,我这边测试影响不大(cpp版本的),看文档中onnxruntime 14.0之后默认是这个取值了。不过GPU的速度还是明显没有达到预期,A10和32核的CPU加速比差不多。。。
No branches or pull requests
我这边做了一些尝试,初步定位到paraformer onnx-gpu耗时过长的原因:
1. predictor中的cif部分
可将其替换成 https://github.com/George0828Zhang/torch_cif
(CIF的一种快速并行的实现方式,但没确认是否等价于paraformer内部的实现)
2. onnxruntime 中 CUDA Settings的问题:
cudnn_conv_algo_search 的默认配置是 EXHAUSTIVE,这个选项的默认配置是比较耗时的,尤其影响卷积操作(通过打印的日志可以发现耗时的部分全部集中在 decoder 部分的Conv_kernel_time)
"dur" :52419,"ts" :4481356,"ph" : "X","name" :"/decoder/decoders.X/self_attn/fsmn_block/Conv_kernel_time"
The text was updated successfully, but these errors were encountered: