-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: an illegal memory access was encountered #41
Comments
Hi, I have been trying to get to the bottom of this, but was unable to reproduce it so far. Would you by any chance be available for a Skype (or similar) session to run through it? |
Also one question: I see the message Are you in fact running it on the Blender data set? |
I'm running the code with nerf_synthetic dataset. The colleague I mentioned successed running your code on the exact same dataset. And for the request of Skype session, can you make it with zoom? |
Thanks for suggesting, but I did a debug session now with another user for the same problem. It looks like I will need to add more diagnostics before I can find out what's going on. I'll let you know when I find out more :) |
Hi @seohoiki3215
and then run what failed before with Best, |
Thank you for giving me some updates for the issue. I've re-run the code with the procedure, and here is the result! Optimizing |
Hi, so I tried it, unfortunately it just works for me, the state you submitted is valid. I have to say I'm running out of ideas what this could be Best, Bernhard |
I am sorry to hear that the error is not reproducible. ;( |
I also encounter this error. Any help/update? Here is the debug message I got:
|
@seohoiki3215 did you manage to resolve this? |
@Snosixtyboo This is my dump file This is my error: I'm running with ubuntu 20.04 cuda 11.8 RTX3090 driver 520. I was wondering if you have any advice on how to resolve this? |
@fatbao55 Please check this PR, graphdeco-inria/diff-gaussian-rasterization#10
After changing the code, reinstall the module by |
@jsl013 This worked for me, thanks so much! |
This is a life saver for me, after two days of debugging and tried 4 different clusters, this finally help me to solve the problem on ubuntu. |
Had same issue with |
ORZ, I have installed the Input:
Output:
|
This works for me! I appreciate your kind tip! |
Hello, I was surprised by your work and tried to reproduce it with the code you've provided.
However, every time I tried to run the code, it always failed to run with the runtime error i mentioned on the title.
Traceback (most recent call last):
File "train.py", line 213, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint)
File "train.py", line 87, in training
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
File "/home/seohoiki/Research/NeRF/gaussian-splatting/utils/loss_utils.py", line 38, in ssim
window = window.cuda(img1.get_device())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]
I tried all the methods you've told in other issues, but failed.
My system & settings:
RTX4090
Ubuntu 22.04 LTS
Exact environment with given .yml file
Strangely, my colleague who has system with RTX 3090 / Ubuntu 20.04 runs the code without any problem.(Except them, all the settings are exactly the same including CUDA SDK version)
I hope I can get some solution for this problem!
Thank you.
=====================================
Results with cuda-memcheck
========= CUDA-MEMCHECK
========= This tool is deprecated and will be removed in a future release of the CUDA toolkit
========= Please use the compute-sanitizer tool as a drop-in replacement
Optimizing
Output folder: ./output/54877260-0 [17/07 19:21:51]
Tensorboard not available: not logging progress [17/07 19:21:51]
Found transforms_train.json file, assuming Blender data set! [17/07 19:21:51]
Reading Training Transforms [17/07 19:21:51]
Reading Test Transforms [17/07 19:21:53]
Loading Training Cameras [17/07 19:21:56]
Loading Test Cameras [17/07 19:21:57]
Number of points at initialisation : 100000 [17/07 19:21:57]
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 213, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint)
File "train.py", line 87, in training
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
File "/home/seohoiki/Research/NeRF/gaussian-splatting/utils/loss_utils.py", line 38, in ssim
window = window.cuda(img1.get_device())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]
========= ERROR SUMMARY: 0 errors
The text was updated successfully, but these errors were encountered: