Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having multiple VoxelGIs in a scene causes Vulkan error: Did not create swapchain successfully. VK_NOT_READY and Vulkan error: Cannot submit graphics queue. VK_ERROR_DEVICE_LOST #80286

Closed
viksl opened this issue Aug 5, 2023 · 9 comments · Fixed by #80356

Comments

@viksl
Copy link
Contributor

viksl commented Aug 5, 2023

Godot version

4.1.1

System information

Windows 10, Nvidia GTX 1660Ti, Vulkan

Issue description

Having multiple VoxelGI nodes and trying to bake them consistently causes never ending Vulkan error spam as seen in the image below. This completely stops the editor as well as the project itself and it's impossible to work on the project or play it.

For me it's just 5 VoxelGI nodes but it could be higher/lower for you. Considering the bake says it's about 8MB and my GPU has 6GB it shouldn't be not enough vram issue? Sometimes I get to bake 4 of them and it breaks while baking the fifth one. Sometimes jsut having 5 GIs in and baking only one is enough to break it. One way or another it happens everytime, no exceptions.

I've tried this on a number of older drivers as well as brand new ones (yesterday updated to the latest stable) but the issue is the same.

obrazek

Steps to reproduce

In the test project it will either be already broken or not.

  1. If it's not broken then click on each VoxelGI node and bake them until you get the error, if nothing happens duplicate them and carry on until you get the error.
  2. If it's already broken and you want to reproduce then create a new project, 3d scene, add couple voxelGI nodes, move them aside so they don't overla and just bake them one after another until it breaks and your project is no longer usable.

Minimal reproduction project

VGITestGD.zip

@darksylinc
Copy link
Contributor

I'm able to repro this bug. Looking into it.

@darksylinc
Copy link
Contributor

darksylinc commented Aug 6, 2023

Uploading validations error reports.

The 4.1 one is too spammy because it contains errors that have been fixed very recently (last week). But it is still reproducible on 4.2

Everything points out to Godot not waiting until the GPU is done.

GodotValidations.zip

Update:

It complains here:

// Ensure no more than FRAME_LAG renderings are outstanding.
vkWaitForFences(device, 1, &fences[frame_index], VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &fences[frame_index]); // -> complains we can't reset

Clearly because we just waited; the only logical explanation is that vkWaitForFences didn't return VK_SUCCESS, which means the error must be earlier.

@viksl
Copy link
Contributor Author

viksl commented Aug 6, 2023

@darksylinc do you need me to test anything on my end? I'm not much of a help but if there's anything I can do to help let me know. Unfortunately it's a blocking pain of an issue for me right now :'/

@Calinou
Copy link
Member

Calinou commented Aug 6, 2023

// Ensure no more than FRAME_LAG renderings are outstanding.

That reminds me, I discussed with reduz that we should decrease the number of queued frames to reduce input lag, as we seem to be queuing at least one too frame too many compared to OpenGL.

However, at the same time, we should ensure we don't prevent all parallelism opportunities (as I understand it, changing FRAME_LAG to 1 would have that effect). Basically, what we need to do is similar to Direct3D 11 when you tell it to keep GPU/CPU synchronization to 1 frame at most (which is a sensible default). We could also provide a project setting to adjust this depending on how much you want to favor performance over latency and vice versa.

@viksl
Copy link
Contributor Author

viksl commented Aug 6, 2023

Strangely when I add a single MeshInstance with a BoxMesh per every VoxelGI (and move the meshes into resepctive VoxelGI in the world) I don't get these errors anymore and project keeps working (at least when baking from the script). I even upped the number of VoxelGIs to 24 and it's still just fine on my side.

But to make it more puzzling, I did this in the Bistro demo and I'm pretty sure I can't see any VoxelGI which doesn't have meshes inside but I still get this error there after baking several VoxelGIs.

EDIT:
If I set GI Mode on the MeshInstances to disabled then I'm back at the errors when baking the third VoxelGI on my machine.

EDIT2:
If I set all meshes to Dynamic I get to this error too. (sheesh the performance with dynamic is glacial, the whole game just barely moves, it's nto even worth talking about performance at this point, I know there's a bug report for this but this is the first time I experienced this, what the heck is going on? :D)

I'll do more tests in the bistro demo later on not sure if today or tomorrow or so, currently life's busy.

@Calinou
Copy link
Member

Calinou commented Aug 6, 2023

EDIT2:
If I set all meshes to Dynamic I get to this error too. (sheesh the performance with dynamic is glacial, the whole game just barely moves, it's nto even worth talking about performance at this point, I know there's a bug report for this but this is the first time I experienced this, what the heck is going on? :D)

See #55359. Note that even with oversampling disabled, you aren't supposed to make all meshes use dynamic GI. Only use dynamic GI for select meshes where it makes a significant visual difference. For some emissive dynamic meshes, it may be better to add an OmniLight3D or SpotLight3D as a child node.

@viksl
Copy link
Contributor Author

viksl commented Aug 6, 2023

@Calinou Yep, I know that. I only wanted to check all the variations for this bug report since with static it seems to be okish (although I do want to test it on a more mesh complex scene) while the other options be it disabled or dynamic trigger this issue if nothing else almost immediately hoping it would help looking for what could be going wrong maybe? (my italic comment was just a surprise how even a single dynamic mesh can take performance down to 3 fps in an otherwise empty scene, it was not relevant I just found it interesting and only wanted to point it out since I know there's a report about not having at least one static body in VoxelGI which causes significant performance drop, perhaps these issues are related?), nothing else.

I hope I cleared a potential misunderstanding with my comment above? :-)

EDIT (oops forgot to edit this post):

I tried the Bistro demo again the way I described above and I did not get an error - as long as there's GI static geometry in VoxelGI.
On the other hand I found that you can't combine multiple VoxelGIs even if you don't overlap them, the lighting information just gets messed up, inside some of the vollumes the information is just suddenly missing and shadows are gone, somewhere else the lighting conditions don't match with neighbours in the same area and so on, some shadows are glitching (they pop in based on if you don't look at the area straight or if you are inside the shadow itself and so on.

@Calinou Is VoxelGI meant to work with multiple pieces in the world or not, I'm not sure should I report this as another bug or not?

@darksylinc
Copy link
Contributor

OK I've found the problem.

mipmaps[i].cell_count = 0

Therefore int64_t wg_todo = (mipmaps[i].cell_count - 1) / wg_size + 1; underflows and becomes a ridiculous large number (67108864) that spams a lot of dispatch calls with 65535 thread groups.

This kills pretty much any GPU that isn't super high end or has a high TDR timeout.

@darksylinc
Copy link
Contributor

I've submitted a PR that fixes this problem.

@Calinou I can't guarantee if this solves #55359 or not, but perhaps you could check.

@akien-mga akien-mga added this to the 4.2 milestone Aug 7, 2023
IntangibleMatter pushed a commit to IntangibleMatter/godot that referenced this issue Aug 13, 2023
The code wanted to divide and round up:
 - 0 / 64 = 0
 - 63 / 64 = 1
 - 64 / 64 = 1
 - 65 / 64 = 2

However when the dividend was exactly 0 it would underflow and produce
67108864 instead.

This caused TDRs on empty scenes or extremely slow performance

Fix godotengine#80286
YuriSizov pushed a commit to YuriSizov/godot that referenced this issue Sep 21, 2023
The code wanted to divide and round up:
 - 0 / 64 = 0
 - 63 / 64 = 1
 - 64 / 64 = 1
 - 65 / 64 = 2

However when the dividend was exactly 0 it would underflow and produce
67108864 instead.

This caused TDRs on empty scenes or extremely slow performance

Fix godotengine#80286

(cherry picked from commit e783e32)
mandryskowski pushed a commit to mandryskowski/godot that referenced this issue Oct 11, 2023
The code wanted to divide and round up:
 - 0 / 64 = 0
 - 63 / 64 = 1
 - 64 / 64 = 1
 - 65 / 64 = 2

However when the dividend was exactly 0 it would underflow and produce
67108864 instead.

This caused TDRs on empty scenes or extremely slow performance

Fix godotengine#80286
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants