Description
Prerequisite
- I have searched Issues and Discussions but cannot get the expected help.
- The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmcv).
Environment
5090显卡根本配不了环境,怎么都是显示mmcv库缺失,cuda12.8赶紧给适配下5090好用啊
Reproduces the problem - code sample
Traceback (most recent call last):
File "./tools/train.py", line 287, in
main()
File "./tools/train.py", line 276, in main
train_model(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model
train_detector(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector
model = MMDistributedDataParallel(
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python
Reproduces the problem - command or script
适配5090显卡
Reproduces the problem - error message
Traceback (most recent call last):
File "./tools/train.py", line 287, in
main()
File "./tools/train.py", line 276, in main
train_model(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model
train_detector(
File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector
model = MMDistributedDataParallel(
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python
Additional information
No response