RuntimeError: CUDA error: an illegal memory access was encountered #41

seohoiki3215 · 2023-07-17T10:18:41Z

Hello, I was surprised by your work and tried to reproduce it with the code you've provided.
However, every time I tried to run the code, it always failed to run with the runtime error i mentioned on the title.

Traceback (most recent call last):
File "train.py", line 213, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint)
File "train.py", line 87, in training
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
File "/home/seohoiki/Research/NeRF/gaussian-splatting/utils/loss_utils.py", line 38, in ssim
window = window.cuda(img1.get_device())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]

I tried all the methods you've told in other issues, but failed.
My system & settings:
RTX4090
Ubuntu 22.04 LTS
Exact environment with given .yml file

Strangely, my colleague who has system with RTX 3090 / Ubuntu 20.04 runs the code without any problem.(Except them, all the settings are exactly the same including CUDA SDK version)

I hope I can get some solution for this problem!

Thank you.

=====================================
Results with cuda-memcheck

========= CUDA-MEMCHECK
========= This tool is deprecated and will be removed in a future release of the CUDA toolkit
========= Please use the compute-sanitizer tool as a drop-in replacement
Optimizing
Output folder: ./output/54877260-0 [17/07 19:21:51]
Tensorboard not available: not logging progress [17/07 19:21:51]
Found transforms_train.json file, assuming Blender data set! [17/07 19:21:51]
Reading Training Transforms [17/07 19:21:51]
Reading Test Transforms [17/07 19:21:53]
Loading Training Cameras [17/07 19:21:56]
Loading Test Cameras [17/07 19:21:57]
Number of points at initialisation : 100000 [17/07 19:21:57]
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 213, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint)
File "train.py", line 87, in training
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
File "/home/seohoiki/Research/NeRF/gaussian-splatting/utils/loss_utils.py", line 38, in ssim
window = window.cuda(img1.get_device())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]
========= ERROR SUMMARY: 0 errors

Snosixtyboo · 2023-07-17T10:23:59Z

Hi,

I have been trying to get to the bottom of this, but was unable to reproduce it so far. Would you by any chance be available for a Skype (or similar) session to run through it?

Snosixtyboo · 2023-07-17T10:26:06Z

Also one question: I see the message
" Found transforms_train.json file, assuming Blender data set! [17/07 19:21:51] "

Are you in fact running it on the Blender data set?

seohoiki3215 · 2023-07-17T10:47:20Z

I'm running the code with nerf_synthetic dataset. The colleague I mentioned successed running your code on the exact same dataset.
Link: https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1

And for the request of Skype session, can you make it with zoom?

Snosixtyboo · 2023-07-17T13:22:09Z

Thanks for suggesting, but I did a debug session now with another user for the same problem. It looks like I will need to add more diagnostics before I can find out what's going on. I'll let you know when I find out more :)

Snosixtyboo · 2023-07-23T20:46:25Z

Hi @seohoiki3215
I finally managed to do the debug version of the rasterizer, I hope this will help. To use it, please do

git pull
git submodule update
pip uninstall diff-gaussian-rasterization (yes)
pip install submodules/diff-gaussian-rasterization

and then run what failed before with --debug. This is slow: so if it takes a while for the error to appear, you can also use --debug_from <iteration> to start debugging only at a certain point. If everything goes well, you should get an error message and a snapshot_fw or snapshot_bw file in the gaussian_splatting directory. If you could forward this file to us, we could take a look to see if we find something wrong!

Best,
Bernhard

seohoiki3215 · 2023-07-24T02:07:46Z

Thank you for giving me some updates for the issue. I've re-run the code with the procedure, and here is the result!

snapshot_fw.zip

Optimizing
Output folder: ./output/9feda2d2-9 [24/07 11:03:34]
Tensorboard not available: not logging progress [24/07 11:03:34]
Found transforms_train.json file, assuming Blender data set! [24/07 11:03:34]
Reading Training Transforms [24/07 11:03:34]
Reading Test Transforms [24/07 11:03:36]
Loading Training Cameras [24/07 11:03:40]
Loading Test Cameras [24/07 11:03:42]
Number of points at initialisation : 100000 [24/07 11:03:42]
Training progress: 0%|
| 0/30000 [00:00<?, ?it/s]
[CUDA ERROR] in cuda_rasterizer/rasterizer_impl.cu
Line 298: an illegal memory access was encountered
An error occured in forward. Please forward snapshot_fw.dump for debugging. [24/07 11:03:42]
Traceback (most recent call last):
File "train.py", line 216, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train.py", line 83, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background)
File "/home/seohoiki/Research/NeRF/gaussian-splatting/gaussian_renderer/init.py", line 93, in render
cov3D_precomp = cov3D_precomp)
File "/home/seohoiki/anaconda3/envs/gaussian_splatting/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/seohoiki/anaconda3/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 219, in forward
raster_settings,
File "/home/seohoiki/anaconda3/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 41, in rasterize_gaussians
raster_settings,
File "/home/seohoiki/anaconda3/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 90, in forward
raise ex
File "/home/seohoiki/anaconda3/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 86, in forward
num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: an illegal memory access was encountered
Training progress: 0%|

Snosixtyboo · 2023-07-24T06:05:01Z

Hi,

so I tried it, unfortunately it just works for me, the state you submitted is valid. I have to say I'm running out of ideas what this could be ☹️. I have only seen the issue happen on Linux so far. Are there other GPUs in your machine? Are your GPU drivers up to date?

Best, Bernhard

seohoiki3215 · 2023-07-24T06:14:02Z

I am sorry to hear that the error is not reproducible. ;(
I have a single RTX4090 on my system and for driver, it's up to date(535).
For CUDA toolkit, , it's version is 11.7

stevenygd · 2023-09-05T08:10:03Z

I also encounter this error. Any help/update? Here is the debug message I got:

[CUDA ERROR] in /home/gaussian-splatting/submodules/diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.cu
Line 298: an illegal memory access was encountered
An error occured in forward. Please forward snapshot_fw.dump for debugging. [05/09 01:11:48]
Traceback (most recent call last):
  File "train.py", line 216, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 83, in training
    render_pkg = render(viewpoint_cam, gaussians, pipe, background)
  File "/home/gaussian-splatting/gaussian_renderer/__init__.py", line 93, in render
    cov3D_precomp = cov3D_precomp)
  File "/home//miniconda3/envs/gaussian_splatting/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/miniconda3/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/__init__.py", line 219, in forward
    raster_settings, 
  File "/home//miniconda3/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/__init__.py", line 41, in rasterize_gaussians
    raster_settings,
  File "/home//miniconda3/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/__init__.py", line 90, in forward
    raise ex
  File "/home//miniconda3/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/__init__.py", line 86, in forward
    num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: an illegal memory access was encountered

fatbao55 · 2023-10-02T12:30:40Z

@seohoiki3215 did you manage to resolve this?

fatbao55 · 2023-10-02T12:32:03Z

@Snosixtyboo This is my dump file
snapshot_fw.zip obtained with the debug version of the rasterizer.

This is my error:
num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args) RuntimeError: an illegal memory access was encountered Training progress: 0%| | 0/30000 [00:00<?, ?it/s]

I'm running with ubuntu 20.04 cuda 11.8 RTX3090 driver 520. I was wondering if you have any advice on how to resolve this?

junseo013 · 2023-10-09T02:54:11Z

@fatbao55 Please check this PR, graphdeco-inria/diff-gaussian-rasterization#10
For my case, adding "-Xcompiler -fno-gnu-unique" option in submodules/diff-gaussian-rasterization/setup.py: line 29 resolves the illegal memory access error in training.

...
29 extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")]})
...

After changing the code, reinstall the module by
pip uninstall diff-gaussian-rasterization -y && pip install submodules/diff-gaussian-rasterization

fatbao55 · 2023-10-09T03:37:15Z

@jsl013 This worked for me, thanks so much!

FantasticOven2 · 2023-10-29T22:24:40Z

@fatbao55 Please check this PR, graphdeco-inria/diff-gaussian-rasterization#10 For my case, adding "-Xcompiler -fno-gnu-unique" option in submodules/diff-gaussian-rasterization/setup.py: line 29 resolves the illegal memory access error in training.
...
29 extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")]})
...
After changing the code, reinstall the module by pip uninstall diff-gaussian-rasterization -y && pip install submodules/diff-gaussian-rasterization

This is a life saver for me, after two days of debugging and tried 4 different clusters, this finally help me to solve the problem on ubuntu.

mushroonhead · 2023-11-10T06:32:51Z

@fatbao55 Please check this PR, graphdeco-inria/diff-gaussian-rasterization#10 For my case, adding "-Xcompiler -fno-gnu-unique" option in submodules/diff-gaussian-rasterization/setup.py: line 29 resolves the illegal memory access error in training.
...
29 extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")]})
...
After changing the code, reinstall the module by pip uninstall diff-gaussian-rasterization -y && pip install submodules/diff-gaussian-rasterization

Had same issue with diff-gaussian-rasterization as well. This solves it for me. I am running on a WSL2 Ubuntu-20.04 setup with Cuda 11.8 toolkit.

ShuzhaoXie · 2023-11-15T06:41:22Z

Hi @seohoiki3215 I finally managed to do the debug version of the rasterizer, I hope this will help. To use it, please do
git pull
git submodule update
pip uninstall diff-gaussian-rasterization (yes)
pip install submodules/diff-gaussian-rasterization
and then run what failed before with --debug. This is slow: so if it takes a while for the error to appear, you can also use --debug_from <iteration> to start debugging only at a certain point. If everything goes well, you should get an error message and a snapshot_fw or snapshot_bw file in the gaussian_splatting directory. If you could forward this file to us, we could take a look to see if we find something wrong!

Best, Bernhard

ORZ, I have installed the debug version. Could anyone tell me how to use the '--debug' arg? I add it to the render.py but got the following error...

Input:

python render.py --debug ...

Output:

usage: render.py [-h] [--sh_degree SH_DEGREE] [--source_path SOURCE_PATH]
                    [--model_path MODEL_PATH] [--images IMAGES]
                    [--resolution RESOLUTION] [--white_background] [--eval]
                    [--convert_SHs_python] [--compute_cov3D_python]
                    [--iteration ITERATION]

jhq1234 · 2024-01-10T11:59:01Z

@fatbao55 Please check this PR, graphdeco-inria/diff-gaussian-rasterization#10 For my case, adding "-Xcompiler -fno-gnu-unique" option in submodules/diff-gaussian-rasterization/setup.py: line 29 resolves the illegal memory access error in training.
...
29 extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")]})
...
After changing the code, reinstall the module by pip uninstall diff-gaussian-rasterization -y && pip install submodules/diff-gaussian-rasterization

This works for me! I appreciate your kind tip!

seohoiki3215 closed this as completed Jul 17, 2023

seohoiki3215 reopened this Jul 17, 2023

ShunChengWu mentioned this issue Sep 28, 2023

fix the error of illegal memory access caused by cub graphdeco-inria/diff-gaussian-rasterization#10

Open

YixunLiang mentioned this issue Dec 6, 2023

Error when installing requirements EnVision-Research/LucidDreamer#17

Closed

ymq2017 mentioned this issue Jan 8, 2024

How to view the output ply file in vanilla gaussian-splatting's SIBR_gaussianViewer_app.exe? lkeab/gaussian-grouping#8

Open

This was referenced Feb 16, 2024

radii > 0 - CUDA error illegal access memory autonomousvision/mip-splatting#20

Open

CUDA error: an illegal memory access was encountered #660

Open

azzarelli mentioned this issue Mar 17, 2024

A few packages missed in requirements.txt, and share the experience in the installation yihua7/SC-GS#15

Closed

Mysterious-handsome-man mentioned this issue Apr 1, 2024

RuntimeError: numel: integer multiplication overflow VITA-Group/FSGS#38

Open

This was referenced Apr 3, 2024

Question for render feature? ShijieZhou-UCLA/feature-3dgs#7

Closed

RuntimeError for Speedup mode ShijieZhou-UCLA/feature-3dgs#2

Closed

wrencanfly mentioned this issue Apr 14, 2024

change language feature encoder to dim=4 - CUDA error: an illegal memory access was encountered minghanqin/LangSplat#35

Open

mayu-snba19 mentioned this issue Apr 17, 2024

Can not be training. CUDA error: an illegal memory access was encountered dreamgaussian/dreamgaussian#124

Open

zhongyingji mentioned this issue May 6, 2024

RuntimeError: numel: integer multiplication overflow muskie82/MonoGS#71

Closed

niujinshuchong mentioned this issue May 9, 2024

RuntimeError: CUDA error: an illegal memory access was encountered autonomousvision/gaussian-opacity-fields#31

Closed

zhanghaoyu816 mentioned this issue Jun 20, 2024

CUDA error when I apply my own dataset. BaowenZ/RaDe-GS#4

Open

ForeverAurorak mentioned this issue Jul 29, 2024

RuntimeError: CUDA error: an illegal memory access was encountered graphdeco-inria/hierarchical-3d-gaussians#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: an illegal memory access was encountered #41

RuntimeError: CUDA error: an illegal memory access was encountered #41

seohoiki3215 commented Jul 17, 2023 •

edited

Loading

Snosixtyboo commented Jul 17, 2023 •

edited

Loading

Snosixtyboo commented Jul 17, 2023

seohoiki3215 commented Jul 17, 2023

Snosixtyboo commented Jul 17, 2023

Snosixtyboo commented Jul 23, 2023

seohoiki3215 commented Jul 24, 2023 •

edited

Loading

Snosixtyboo commented Jul 24, 2023

seohoiki3215 commented Jul 24, 2023

stevenygd commented Sep 5, 2023 •

edited

Loading

fatbao55 commented Oct 2, 2023

fatbao55 commented Oct 2, 2023 •

edited

Loading

junseo013 commented Oct 9, 2023 •

edited

Loading

fatbao55 commented Oct 9, 2023

FantasticOven2 commented Oct 29, 2023

mushroonhead commented Nov 10, 2023

ShuzhaoXie commented Nov 15, 2023

jhq1234 commented Jan 10, 2024

RuntimeError: CUDA error: an illegal memory access was encountered #41

RuntimeError: CUDA error: an illegal memory access was encountered #41

Comments

seohoiki3215 commented Jul 17, 2023 • edited Loading

Snosixtyboo commented Jul 17, 2023 • edited Loading

Snosixtyboo commented Jul 17, 2023

seohoiki3215 commented Jul 17, 2023

Snosixtyboo commented Jul 17, 2023

Snosixtyboo commented Jul 23, 2023

seohoiki3215 commented Jul 24, 2023 • edited Loading

Snosixtyboo commented Jul 24, 2023

seohoiki3215 commented Jul 24, 2023

stevenygd commented Sep 5, 2023 • edited Loading

fatbao55 commented Oct 2, 2023

fatbao55 commented Oct 2, 2023 • edited Loading

junseo013 commented Oct 9, 2023 • edited Loading

fatbao55 commented Oct 9, 2023

FantasticOven2 commented Oct 29, 2023

mushroonhead commented Nov 10, 2023

ShuzhaoXie commented Nov 15, 2023

jhq1234 commented Jan 10, 2024

seohoiki3215 commented Jul 17, 2023 •

edited

Loading

Snosixtyboo commented Jul 17, 2023 •

edited

Loading

seohoiki3215 commented Jul 24, 2023 •

edited

Loading

stevenygd commented Sep 5, 2023 •

edited

Loading

fatbao55 commented Oct 2, 2023 •

edited

Loading

junseo013 commented Oct 9, 2023 •

edited

Loading