Description
Main Challenge
I'm currently trying to setup an environment where there is a large number of cameras, all taking a single picture at a single point in time. When I initially tried to do that, my simulation crashed with the either one of the following error messages:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 23.68 GiB total capacity; 8.44 MiB already allocated; 57.12 MiB free; 22.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
RuntimeError: Array allocation failed on device: cuda:0 for 16777216 bytes
This happens as soon as the number of cloned environments (and therefore cameras) is 10 or larger.
If I read the above error message correctly, I should still have plenty of capacity on my GPU, but it's not marked as "free" - how does this happen?
Also, if my math is correct, a single image of an environment should not take up more than 16 MB of space (2048 * 2048 * 4 * 8 bits).
Workaround
I thought then that a possible workaround for this issue could be that I iteratively position the camera over all of the environments, take a picture and add it to a list. This works, but in practice i need to step the simulation a few times (3) every time i reposition the camera for it to "see" something. This time adds up very quickly: Let's say I have 1000 envs and I'm running at 3 fps - it would take 15 minutes to loop over all of them.
Questions
- Is there any way to make this more efficient?
- Am I doing something wrong with the way I implement the parallel cameras?
- Can I allocate more space on the GPU for the camera data?
- Is my GPU just not powerful enough?
Was always, If I can provide more code/data/anything to help solve this, I'm more than happy to.
Example Code
My camera setup for the parallel run (simplified) is as follows:
sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0")
sim = sim_utils.SimulationContext(sim_cfg)
...
camera_cfg = CameraCfg(prim_path="/World/envs/env_.*/Camera_RGB",
update_period=0,
height=2048,
width=2048,
data_types=["rgb"],
spawn=sim_utils.PinholeCameraCfg(focal_length=24.0,
focus_distance=400.0,
horizontal_aperture=20.955,
clipping_range=(0.1, 1.0e5)
),
)
...
while simulation_app.is_running():
...
if count == 20:
sim.pause()
camera = Camera(cfg=camera_cfg)
sim.play()
for _ in range(2):
sim.step()
camera.update(sim_dt)
camera_captures = camera.data.info[0]
camera.__del__()
And my (simplified) setup for iterating through the environments:
sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0")
sim = sim_utils.SimulationContext(sim_cfg)
...
camera_cfg = CameraCfg(prim_path="/Cameras/Camera_RGB",
update_period=0,
height=2048,
width=2048,
data_types=["rgb"],
spawn=sim_utils.PinholeCameraCfg(focal_length=24.0,
focus_distance=400.0,
horizontal_aperture=20.955,
clipping_range=(0.1, 1.0e5)
),
)
...
while simulation_app.is_running():
...
if count == 20:
sim.pause()
camera = Camera(cfg=camera_cfg)
sim.play()
camera_captures = []
for env_index in envs_to_capture:
camera_pos = envs_positions[env_index, :] + camera_offset_pos
camera_rot = camera_offset_rot
camera.set_world_poses(positions=camera_pos.unsqueeze(0),
orientations=camera_rot.unsqueeze(0),
convention="opengl")
for _ in range(2):
sim.step()
camera.update(dt = 0.01)
camera_captures.append(camera.data.info[0])
camera.__del__()
System specs
- Using the devel branch of Orbit
- Commit: aaab27b
- Isaac Sim Version: 2023.1.0-hotfix.1
- OS: Ubuntu 22.04
- GPU: RTX 3090
- CUDA: 12.0
- GPU Driver: 525.147.05