Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projects using the "compatibility" rendering engine take a few seconds to run now #95772

Open
The2AndOnly opened this issue Aug 18, 2024 · 15 comments

Comments

@The2AndOnly
Copy link

The2AndOnly commented Aug 18, 2024

Tested versions

  • Reproducible in: v4.3.stable.official [77dcf97]
  • Not Reproducible in: v4.2.stable.official [46dc277], v4.2.2.stable.official [15073af]

System information

Godot v4.3.stable - Windows 10.0.22631 - GLES3 (Compatibility) - NVIDIA GeForce RTX 4050 Laptop GPU (NVIDIA; 32.0.15.5599) - 13th Gen Intel(R) Core(TM) i7-13650HX (20 Threads)

Issue description

When you click the "Run Project" button, or the "Run Current Scene" button, or the "Run Specific Scene" button, then the window for the project will take a few seconds to open. But that's only if the rendering engine is set to compatibility. This makes it slower to playtest, since before 4.3 the project would run almost instantly. And this happens no matter how big or small the project/scene is.

Steps to reproduce

  • Create a new project with "compatibility" as the rendering engine, or take an existing project and switch it to compatibility
  • If needed, create an empty scene for the project (it doesn't have to be empty, the scene can be anything)
  • Run the project

Minimal reproduction project (MRP)

new-game-project.zip

@Calinou
Copy link
Member

Calinou commented Aug 18, 2024

Do you have any specific USB devices connected (most notably Corsair iCUE devices)? See #20566.

@The2AndOnly
Copy link
Author

I have no USBs plugged into my laptop

@The2AndOnly
Copy link
Author

I didn't mean to click the close button

@alvinhochun
Copy link
Contributor

I checked with --benchmark starting the project manager on Windows 10, since it uses the compatibility renderer:

4.2.2:

BENCHMARK:
        - register_core_types :  0.009066  sec.
        - servers :  0.801082  sec.
        - register_editor_types :  0.003066  sec.
        - scene :  0.125518  sec.
        - editor_register_and_generate_icons_all :  0.296479  sec.
        - editor_register_and_generate_icons_with_only_thumbs :  0.002188  sec.
        - editor_register_fonts :  0.00319  sec.
        - create_editor_theme :  0.541345  sec.
        - create_custom_theme :  0.541358  sec.
        - project_manager :  0.7008  sec.
        - startup_begin :  1.662315  sec.

4.3:

BENCHMARK:
        [Startup]
                - Main::Setup: 17.741 msec.
                - Initialize Early Settings: 2.896 msec.
                - Servers: 2032.041 msec.
                - Setup Window and Boot: 9.411 msec.
                - Translations and Remaps: 0.133 msec.
                - Text Server: 0.066 msec.
                - Scene: 96.365 msec.
                - Platforms: 0.112 msec.
                - Finalize Setup: 20.581 msec.
                - Main::Setup2: 2165.028 msec.
                - Project Manager: 653.754 msec.
                - Main::Start: 669.588 msec.

        [Core]
                - Register Types: 10.685 msec.
                - Register Extensions: 0.343 msec.
                - Register Singletons: 0.059 msec.

        [Servers]
                - Register Extensions: 14.399 msec.
                - Modules and Extensions: 0.827 msec.
                - Input: 16.056 msec.
                - Display: 1868.940 msec.
                - Tablet Driver: 0.065 msec.
                - Rendering: 53.652 msec.
                - Audio: 75.929 msec.
                - XR: 0.014 msec.
                - Register Singletons: 0.019 msec.

        [Scene]
                - Register Types: 58.829 msec.
                - Register Singletons: 0.008 msec.
                - Modules and Extensions: 37.420 msec.

        [Editor]
                - Register Types: 2.940 msec.
                - Modules and Extensions: 0.041 msec.

        [EditorTheme (Startup)]
                - Register Icons: 270.970 msec.
                - Register Fonts: 2.789 msec.
                - Create Base Theme: 527.677 msec.
                - Merge Custom Theme: 0.003 msec.
                - Generate Theme: 527.689 msec.

Since the names has changed, I attempted to match up the equivalent benchmark items:

name (4.2.2) name (4.3) 4.2.2 (ms) 4.3 (ms)
register_core_types Core/Register Types 9.066 10.685
servers Startup/Servers 801.082 2032.041
register_editor_types Editor/Register Types 3.066 2.940
scene Scene/Register Types 125.518 58.829
editor_register_and_generate_icons_all EditorTheme (Startup)/Register Icons 296.479 270.970
editor_register_and_generate_icons_with_only_thumbs 2.188
editor_register_fonts EditorTheme (Startup)/Register Fonts 3.19 2.789
create_editor_theme EditorTheme (Startup)/Create Base Theme 541.345 527.677
create_custom_theme EditorTheme (Startup)/Generate Theme 541.358 527.689
project_manager Startup/Project Manager 700.8 653.754
startup_begin Startup/Main::Setup
Startup/Main::Setup2
Startup/Main::Start
1662.315 17.741
2165.028
669.588

It seems there is a significant increase of time taken in "servers" or "Startup/Servers" (+150%).

@akien-mga akien-mga added this to the 4.4 milestone Aug 19, 2024
@akien-mga
Copy link
Member

Could you test intermediate 4.3 dev/beta/rc snapshots to pinpoint the first one that shows this performance/speed regression? https://godotengine.org/download/archive/

@The2AndOnly
Copy link
Author

The2AndOnly commented Aug 19, 2024

Could you test intermediate 4.3 dev/beta/rc snapshots to pinpoint the first one that shows this performance/speed regression? https://godotengine.org/download/archive/

It first happens on dev4 version of 4.3. https://godotengine.org/download/archive/4.3-dev4/

I suspect it has something to do with this change:

  • Windows: Set application user model ID to prevent editor / running project and different versions of editor taskbar icon stacking (GH-85905).

It's just the most notable change with running the project, but I don't know if this affects anything
But also try checking any of the changes to compatibility
https://godotengine.github.io/godot-interactive-changelog/#4.3-dev4

@akien-mga
Copy link
Member

It seems there is a significant increase of time taken in "servers" or "Startup/Servers" (+150%).

I guess this comes mainly from the rendering server.

Windows: Set application user model ID to prevent editor / running project and different versions of editor taskbar icon stacking (#85905).

Not sure why this would impact startup time, but maybe it impacts which Nvidia profile is used for the application. Could you check which profiles you have in the Nvidia panel for different versions of Godot and see if maybe they have different config?

It may also be related to adding support for Glow in the compatibility, which may simply make it more expensive to run by default.

I tried on Linux and couldn't reproduce any significant startup time difference on my laptop between 4.2 and 4.3, but both take around 0.3s seconds for the Servers init.

If you can compile from source, it would be worth bisecting between 4.3.dev3 and 4.3.dev4 to pinpoint the exact commit that caused the regression with certainty.

@alvinhochun
Copy link
Contributor

From my very crude profiling on a (slightly outdated and with local changes) debug build with VTune (sampling every 0.01 ms). The function that took most time is shown to be GLManagerNative_Windows::_nvapi_setup_profile, and the next one is detect_wgl

Not sure if this is indicative. I don't see changes between dev3 and dev4 that seem relevant and and I don't have before/after comparison. Can't do much testing soon.

@alvinhochun
Copy link
Contributor

It seems there is a significant increase of time taken in "servers" or "Startup/Servers" (+150%).

I guess this comes mainly from the rendering server.

DisplayServer seems more likely:

    [Servers]
            - Display: 1868.940 msec.
            - Rendering: 53.652 msec.

@akien-mga
Copy link
Member

CC @bruvzg

@bruvzg
Copy link
Member

bruvzg commented Aug 20, 2024

I can't reproduce any slowdown:

App ID setup is almost instant (23 usec). The longest DisplayServer init parts are detect_wgl() (about 0.4 sec, half of it is _get_device_ids for ANGLE fallback) and NVIDIA profile setup (about 0.3 sec).

I do not think ANGLE fallback has changed much (apart from adding more devices to the list, but matching a bunch of strings can't have any impact). NV profile was changed, so it's likely the reason, and there's might be an issue with it, it's bound to the executable name, so editor and running project might cause some conflicts or recreation of the profile.

@alvinhochun
Copy link
Contributor

alvinhochun commented Aug 20, 2024

Here there is a noticeable delay. When I run Godot_v4.3-stable_win64_console.exe --verbose, the following two lines are printed about 1.5 seconds apart, while if I run Godot_v4.2.2-stable_win64_console.exe --verbose, they are both printed within less than half a second:

Native OpenGL API detected: 3.3: Intel - Intel(R) UHD Graphics 620
NVAPI: Init OK!

It doesn't look like the delay is caused by NVIDIA profile setup, given that "NVAPI: Init OK!" is printed before actually setting up the profile.

The delay also doesn't happen when running with --rendering-driver opengl3_angle (the Godot window appears almost instantly), so it seems to be something specific to native OpenGL.

@bruvzg
Copy link
Member

bruvzg commented Aug 20, 2024

It might _get_device_ids as well, time it's likely depends on specific devices connected. Not sure what can be done (it's used to get device IDs), but if it is the case, you can disable rendering/gl_compatibility/fallback_to_angle in the project settings to skip it.

@bruvzg
Copy link
Member

bruvzg commented Aug 20, 2024

I guess we can only get IDs if a specific device name pattern is found (it is to detect devices that identify as generic Intel(R) HD Graphics without specific number in the GL device string).

@alvinhochun
Copy link
Contributor

Maybe you can also try other ways of getting the GPU VendorId/DeviceId? Chromium uses IDXGIFactory::EnumAdapters with IDXGIAdapter::GetDesc, should be rather simple to implement: https://source.chromium.org/chromium/chromium/src/+/main:gpu/config/gpu_info_collector_win.cc;l=314;drc=82dff63dbf9db05e9274e11d9128af7b9f51ceaa

Firefox seems to do something more complicated, first using EnumDisplayDevices and parsing the device path for the primary display adapter (?), then using SetupDiEnumDeviceInfo to iterate the GUID_DISPLAY_DEVICE_ARRIVAL devices as a fallback and for secondary GPUs: https://searchfox.org/mozilla-central/source/widget/windows/GfxInfo.cpp#603

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants