Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch:1.12.0+cu113 驱动:530.41.03,gpu manager调度成功之后,使用cuda报错 #34

Open
Justin-ZL opened this issue Apr 24, 2023 · 3 comments

Comments

@Justin-ZL
Copy link

一个问题是使用nvidia-smi显示的数据有问题
image
另一个问题是,在使用cuda的时候报错:RuntimeError: CUDA error: invalid device context

@seanchen022
Copy link

应该是需要适配cuda 12

@panpan0000
Copy link

is this project still under maintenance ?

@hiahia121
Copy link

hiahia121 commented Dec 20, 2023

针对问题1,也可以尝试降低节点上nv卡的驱动版本及含带的cuda版本,例如
image

然后进入业务pod中,执行nvidia-smi命令,查看
image

function not found就会消失,但总显存不是pod分配的显存,还需要解决这个问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants