Skip to content

[Bug] 5090显卡无法适配 #3283

Open
@1273603741

Description

@1273603741

Prerequisite

Environment

5090显卡根本配不了环境,怎么都是显示mmcv库缺失,cuda12.8赶紧给适配下5090好用啊

Reproduces the problem - code sample

Traceback (most recent call last):
File "./tools/train.py", line 287, in
main()
File "./tools/train.py", line 276, in main
train_model(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model
train_detector(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector
model = MMDistributedDataParallel(
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python

Reproduces the problem - command or script

适配5090显卡

Reproduces the problem - error message

Traceback (most recent call last):
File "./tools/train.py", line 287, in
main()
File "./tools/train.py", line 276, in main
train_model(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model
train_detector(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector
model = MMDistributedDataParallel(
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions