You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
请提出你的问题 Please ask your question
在一台机器上有多个paddle训练任务,训练一段时间后会出现训练卡顿,只有某个任务能够正常训练,其余的几乎1-2张每秒,机器上任务情况如下:
1.多个单卡训练任务(读取同份本地数据)
2.多个单机多卡训练任务(读取本地数据),部分单机多卡任务可能开启了共享内存
使用命令查看,CPU占比正常,io读取正常
猜测原因:
1.单机多卡训练优先级高,抢占单机单卡任务资源?
2.因为任务开启了共享内存,导致其他任务资源被占用,卡顿?
3.多个单机任务读取同一份数据,导致资源竞争,卡顿?
........
在此,想问下原因是什么?望回复!谢谢!
The text was updated successfully, but these errors were encountered: