Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paddle2.3 profiler性能分析组件添加后报错 #42828

Open
developWmark opened this issue May 17, 2022 · 12 comments
Open

paddle2.3 profiler性能分析组件添加后报错 #42828

developWmark opened this issue May 17, 2022 · 12 comments
Assignees

Comments

@developWmark
Copy link

bug描述 Describe the Bug

required: gpu

import paddle.profiler as profiler
import paddle
paddle.version.show()
prof = profiler.Profiler(
targets=[profiler.ProfilerTarget.CPU, profiler.ProfilerTarget.GPU],
scheduler=(1, 9),
on_trace_ready=profiler.export_chrome_tracing('./log'))
prof.start()
for iter in range(10):
# train()
prof.step()
prof.stop()

##################################################################################################

aistudio报错提示
image
本地conda环境报错提示,而且本地指定过cudnn环境变量位置
daee136a8a52ea6e0000bb791f17a25

其他补充信息 Additional Supplementary Information

No response

@developWmark developWmark changed the title paddle2.3 性能分析组件 paddle2.3 profiler性能分析组件添加后报错 May 17, 2022
@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@smallv0221 smallv0221 removed their assignment May 17, 2022
@weisy11
Copy link
Contributor

weisy11 commented May 18, 2022

您好,麻烦排查以下问题:
1.在当前环境下,使用paddle旧版本是否可以正常profile
2.确认安装的paddle版本是否与本地的cuda版本对应。例如如果您本地cuda版本为10.1,则应安装paddlepaddle-gpu==2.3.0.post101
3.运行paddle.utils.run_check(),确定安装的paddle是正常的

@weisy11 weisy11 self-assigned this May 18, 2022
@developWmark
Copy link
Author

你好,profile是paddle2.3新推出的特性,paddle2.2没有呀。其次
image
只有在运行这个profiler的时候才会报错,如果用paddle2.3进行其他训练不加profiler,是可以正常训练的

@developWmark
Copy link
Author

这个报错你们可以在aistudio上也可以复现的

1 similar comment
@developWmark
Copy link
Author

这个报错你们可以在aistudio上也可以复现的

@rainyfly
Copy link
Contributor

您好,您在本地的时候 export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64/:$LD_LIBRARY_PATH 加上这个链接库路径,profiler功能依赖于Nvidia的cupti library

@rainyfly
Copy link
Contributor

在aistudio上跑不了是因为权限问题,aistuidio目前环境用的cuda 10.1, 在stricted mode下,非root用户用不了cupti,见说明 https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti ,我们会联系aistudio相关同学看能否将环境设置为unstricted mode。

@developWmark
Copy link
Author

这个问题还是没有解决

@developWmark
Copy link
Author

已经一个月了

@rainyfly
Copy link
Contributor

之前有联系过aistudio的同学,但是aistudio这个平台牵涉面比较广,这个环境问题没有这么容易做出改变,如果您本地有卡,您可以先在本地尝试跑下,或者只开启CPU的性能分析看看work不work

@rainyfly
Copy link
Contributor

您有安装cupti吗,可以按照上面说的链接cupti

@developWmark
Copy link
Author

我本地环境也是conda管理的,用conda装的cudatoolkit不能用,必须要装cuda.run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants