Description
Describe the project you are working on
The Godot editor 🙂
Describe the problem or limitation you are having in your project
Godot currently supports denoising lightmaps using OpenImageDenoise (OIDN), but this is slow for 3 reasons:
- We use an old version of OIDN because recent versions are difficult to build from source. Recent versions feature many optimizations not found in older versions.
- We don't have access to GPU-based denoising, which is only part of recent OIDN versions. On modern GPUs, this can provide a speedup of over 50× compared to multithreaded CPU-based denoising with identical output quality.
- Godot's implementation only uses a single CPU thread, as we don't link against Intel's Threaded Building Blocks (TBB) library. That library is also known to be difficult to integrate in an existing project, especially if it doesn't use CMake.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
There are many advantages to using OIDN via the command line:
- We don't have to bother with building it from source, which is a notoriously difficult task. Recent versions of OIDN requires a specific compiler called ISPC and other large libraries. These aren't readily available in up-to-date versions in Linux distributions, and are even more difficult to build from source on Windows and macOS (not to mention build times can be long). We strive to keep Godot easy to build from source, so using a recent OIDN as a library doesn't seem to be possible (unless we use an approach similar to [macOS/Windows] Add optional ANGLE backed OpenGL renderer support (runtime backend selection). godot#72831).
- We no longer have to integrate OIDN library in the Godot editor binary. This reduces binary size by roughly 7 MB, which is significant.
Performance results on Linux:
# Intel Core i9-13900K
$ oidnBenchmark --device cpu
RT.hdr_alb_nrm.1920x1080 ... 353.716 msec/image
RT.ldr_alb_nrm.1920x1080 ... 357.93 msec/image
RT.hdr_alb_nrm.3840x2160 ... 1457.19 msec/image
RT.ldr_alb_nrm.3840x2160 ... 1452.21 msec/image
RT.hdr_alb_nrm.1280x720 ... 155.315 msec/image
RT.ldr_alb_nrm.1280x720 ... 153.774 msec/image
RTLightmap.hdr.2048x2048 ... 670.833 msec/image
RTLightmap.hdr.4096x4096 ... 2950.94 msec/image
RTLightmap.hdr.1024x1024 ... 167.51 msec/image
# GeForce RTX 4090
$ oidnBenchmark --device cuda
RT.hdr_alb_nrm.1920x1080 ... 6.7645 msec/image # 52× faster than Intel Core i9-13900K
RT.ldr_alb_nrm.1920x1080 ... 6.59508 msec/image # 54× faster
RT.hdr_alb_nrm.3840x2160 ... 27.8542 msec/image # 52× faster
RT.ldr_alb_nrm.3840x2160 ... 27.8098 msec/image # 52× faster
RT.hdr_alb_nrm.1280x720 ... 2.98997 msec/image # 52× faster
RT.ldr_alb_nrm.1280x720 ... 2.96565 msec/image # 52× faster
RTLightmap.hdr.2048x2048 ... 12.5533 msec/image # 53× faster
RTLightmap.hdr.4096x4096 ... 55.1833 msec/image # 53× faster
RTLightmap.hdr.1024x1024 ... 3.0971 msec/image # 54× faster
Denoising a 4K lightmap goes from a several seconds operation to a near-instant one. Even if your GPU is 10 times slower in compute than a RTX 4090, it'll still handily beat the i9-13900K in this test.
System VRAM utilization doesn't exceed 5.3 GB with the 4K lightmap denoise, so it looks like 8 GB GPUs should handle this fine (perhaps 6 GB for smaller lightmaps – remember that the editor will be running at the same time).
There are some caveats though:
- For GPU acceleration to work, the user must have a functional CUDA setup (on NVIDIA), HIP setup (on AMD) or sycl setup (on Intel). If this is not available, multithreaded CPU-based denoising is still available, which is still a net performance win from the current implementation.
- Official OIDN binaries don't include support for saving and loading OpenEXR images, as they doesn't link against OpenImageIO. This can be done with binaries that are built manually. Intel does not officially support using the CLI (it's only meant for evaluation and benchmarking purposes).
- This means we'd have to provide our own OIDN binaries compiled from source, but not having to deal with integrating it in SCons should make the process much easier already. We already do something similar for FBX2glTF.
- In the meantime, you can use this command line to handle conversion from and to OpenEXR (requires ImageMagick to be installed):
# On Windows, use `%TEMP%` (cmd) or `$env:TEMP` (PowerShell) instead of `/tmp`.
convert lightmap.exr -endian LSB /tmp/lightmap.pfm \
&& oidnDenoise --filter RTLightmap --hdr /tmp/lightmap.pfm --output /tmp/lightmap_denoised.pfm \
&& convert /tmp/lightmap_denoised.pfm lightmap.exr
- Calling CLI programs is not possible in the Android and Web editors, but this isn't much of an issue as our version of OIDN doesn't support arm64, which means it's already not effective in the Android editor. Also, you probably won't be baking lightmaps on those platforms given the performance constraints. (The Web editor doesn't support baking lightmaps currently, as it only runs OpenGL.)
This proposal effectively supersedes godotengine/godot#47344, as OIDN would become an external program called by Godot, similar to FBX2glTF (for .fbx
import) and Blender (for .blend
import).
It's also been mentioned that we could use an algorithm such as this one or this one as a fallback to OIDN when the CLI binary isn't installed. This can be implemented as a Vulkan compute shader to provide universal GPU acceleration.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
- When the path to the OIDN CLI binary (
oidnDenoise
) is configured in the Editor Settings and the binary is indeed present, write lightmaps to a temporary location instead of writing them in the project folder. - Call the OIDN CLI binary after baking lightmaps, with the input path being the temporary file output path being within the project folder.
If this enhancement will not be used often, can it be worked around with a few lines of script?
This can be worked around by disabling Use Denoiser in LightmapGI and calling the above command line, but it must be done manually every time after baking lightmaps. Using watchexec can improve this somewhat, but you still need to start it every time you open the project in the editor.
If you do this, the lightmap texture will also be imported twice by the Godot editor (once in its non-denoised form, once in its denoised form). This further slows down the lightmap baking process, especially if VRAM compression is enabled on the lightmap texture. Writing the non-denoised lightmap to a temporary location prevents this issue, but it can't be done from the Godot editor itself.
Is there a reason why this should be core and not an add-on in the asset library?
This can't be worked around with an add-on efficiently (see above).
Activity