Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug Performance Regressions on 0.20 Vulkan #5756

Open
guusw opened this issue May 30, 2024 · 23 comments
Open

Debug Performance Regressions on 0.20 Vulkan #5756

guusw opened this issue May 30, 2024 · 23 comments
Labels
api: vulkan Issues with Vulkan area: performance How fast things go help required We need community help to make this happen.

Comments

@guusw
Copy link

guusw commented May 30, 2024

Hey, I recently tried upgrading from 0.19.1 to 0.20 and noticed I get half the performance using vulkan as I did previously.

I did some profiling and noticed all application code is the same except for one vkQueueSubmit taking up 11ms
I suspect something is waiting for an operation to complete which did not happen previously.

It's worth nothing that I perform a readback from a texture, although it happens asynchronously using wgpuBufferMapAsync.

Screenshot 2024-05-30 144610

Below I have provided two renderdoc captures (before/after the upgrade)
The rendered items are identical, the only difference I could notice is some fences at the start of the 0.19 capture
Screenshot 2024-05-30 150928

Let me know if I can provide any other information to help.

renderdoc.7z.zip

@AdrianEddy
Copy link
Contributor

FWIW, I noticed 2x Vulkan performance decrease but not related to wgpu, but after updating the NVIDIA driver from 535 to 550. Is your GPU driver the same?

@guusw
Copy link
Author

guusw commented May 30, 2024

516.94, will try with the most recent one.

@Wumpf
Copy link
Member

Wumpf commented May 30, 2024

Might also be worth trying against the changes on #5681
We've been meaning to put this in a patch release because there's all sort of problems prior to that PR

@vntec
Copy link

vntec commented May 31, 2024

I've had similar performance problems with Vulkan and v0.20.0.

After upgrading, I noticed mouse input being very unresponsive in my engine. The NVIDIA overlay reported a render latency of ~120ms at 144FPS (Fifo). The frametimes were similar to v0.19.4 (latency was ~11ms). A similarly high latency occurred at higher framerates with VSync off (Immediate).
Initially, I suspected my own engine code, but the same problem occurred when I tested the official wgpu examples. The ash examples did not display this behavior.
I'm not entirely sure how NVIDIA exactly measures "render latency", although it seems to be something like "time from render queue to actual presentation".

Upgrading my driver to 555.85 did not help.

I suspect this has something to do with the recent synchronization changes in #5681.

@Wumpf
Copy link
Member

Wumpf commented May 31, 2024

I suspect this has something to do with the recent synchronization changes in #5681.

any chance you could check against latest trunk? We want to release a patch with that PR soon

@LukasKalbertodt
Copy link

I also ran into this problem after updating to v0.20. It doesn't seem like the framerate has decreased, but the latency increased a lot in Fifo mode. I also use wgpuBufferMapAsync but never block on it.

I use an Nvidia RTX 470 with driver version 550.78, on Linux, winit 0.30, whole desktop running at 60fps.

I did update to the latest trunk as suggested, but unfortunately that did not fix the issue.

Noticing latency manually is a bit error prone so I did the following: just print to the terminal whenever an button press is detected. The button press leads to camera motion and thus to a notable change in the rendered image. I then screen captured terminal + wgpu window and finally counted the frames between "terminal prints event" and "effect visible in rendered image". I know this is far from a perfect measurement of latency, but the trend is clear: before the wgpu 0.20 update, there were 2 frames delay, with wgpu 0.20 it's 6 frames, with c745863 it's also 6.

In Immediate present mode, there doesn't seem to be a problem, I think.

@Wumpf Wumpf added type: bug Something isn't working help required We need community help to make this happen. api: vulkan Issues with Vulkan labels Jun 2, 2024
@vntec
Copy link

vntec commented Jun 2, 2024

I suspect this has something to do with the recent synchronization changes in #5681.

any chance you could check against latest trunk? We want to release a patch with that PR soon

I just checked against the latest trunk and it does seem a bit better. With VSync, the latency is reduced from ~120ms to ~30ms and it's noticeably more responsive. However it's still not as good as v0.19.4 (11-15 ms).
I think the problem still applies to Immediate present mode, but the impact is generally less noticeable since the frametimes and latencies are lower (for simple scenes).

@jimblandy
Copy link
Member

We would definitely like to address this, but so far in this issue it's hard to discern how we can actually debug this. A reduced test case that we can run locally to see the problem ourselves would make progress a lot easier.

@hecrj
Copy link
Contributor

hecrj commented Jul 17, 2024

Just chiming in to say that I have noticed this regression when working on the upgrade in iced, even after #5681—and it is one of the main reasons the upgrade has not been merged yet (iced-rs/iced#2417).

However, it seems I can only reproduce the regression in debug mode. release builds seem to both have similar performance. Talking about render time, specifically. Not latency.

@Wumpf
Copy link
Member

Wumpf commented Jul 17, 2024

@guusw @vntec did you check on release mode?
For Debug we'd only care about regressions if they make things completely unusable for a wide range of trivial usecases - one can always configure cargo.toml such that wgpu is compiled with optimizations even when the rest of the application is not. Compiling wgpu without optimizations is only truly relevant when trying to find issues in wgpu(-core/-hal) itself

@hecrj
Copy link
Contributor

hecrj commented Jul 17, 2024

@Wumpf Which packages should I exactly choose to optimize?

I am adding opt-level = 3 to the wgpu crates in 0.20 but still struggling to obtain the same render time as with no optimizations whatsoever in 0.19. That seems a bit odd to me.

@Wumpf
Copy link
Member

Wumpf commented Jul 17, 2024

you might also need to add opt-level = 3 for wgpu-core and wgpu-hal, don't think that setting is transitive

@hecrj
Copy link
Contributor

hecrj commented Jul 17, 2024

@Wumpf Ah! Disabling debug-assertions did the trick!

[profile.dev.package.wgpu]
debug-assertions = false

[profile.dev.package.wgpu-hal]
debug-assertions = false

[profile.dev.package.wgpu-core]
debug-assertions = false

[profile.dev.package.wgpu-types]
debug-assertions = false

opt-level does not seem to have much of an impact. Maybe 0.20 introduced a lot more debug_assert! in some hotpath?

@hecrj
Copy link
Contributor

hecrj commented Jul 17, 2024

I narrowed it to wgpu-types:

[profile.dev.package.wgpu-types]
debug-assertions = false

This has the biggest impact on performance in debug builds.

EDIT: More experimenting indicates that both 0.19 and 0.20 seem to have same debug render time with debug-assertions disabled.

Disabling debug-assertions for 0.19 has little impact, while 0.20 seems to have more debug checks.

@Wumpf
Copy link
Member

Wumpf commented Jul 17, 2024

nice! thank you for digging into this!!

I'm making the bold move of closing this as won't fix then, if it's just extra debug assertions we're hitting here I'm not concerned about perf :)

@ everyone else: Please re-open if there's reason to believe there's regressions beyond extra debug assertions or it can be shown that certain debug assertions are too excessive.

@Wumpf Wumpf closed this as not planned Won't fix, can't repro, duplicate, stale Jul 17, 2024
@vntec
Copy link

vntec commented Jul 17, 2024

Thanks for the investigation, everyone!
@Wumpf Yes, I was running in release mode, but I'll check if disabling debug assertions does the trick.

EDIT: My render latency problem is gone now. It was probably the synchronization changes interacting with a specific driver version.

@janhohenheim
Copy link
Contributor

janhohenheim commented Jul 17, 2024

@Wumpf could we reopen this? Bevy users are running into it while running their game in debug mode per the linked issue. Usually I would agree that performance is a minor concern for debug assertions, but the change in FPS reported seems staggering: 60 FPS vs 460 FPS for one user and 20 FPS vs 60 FPS for another

@cwfitzgerald cwfitzgerald changed the title Performance regression on 0.20 vulkan Debug Performance Regressions on 0.20 Vulkan Jul 17, 2024
@cwfitzgerald cwfitzgerald reopened this Jul 17, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in WebGPU for Firefox Jul 17, 2024
@cwfitzgerald cwfitzgerald removed the type: bug Something isn't working label Jul 17, 2024
@hecrj
Copy link
Contributor

hecrj commented Sep 19, 2024

I investigated a bit more and traced the issue down all the way to InstanceFlags::from_build_config:

wgpu/wgpu-types/src/lib.rs

Lines 1052 to 1058 in dfc384a

pub fn from_build_config() -> Self {
if cfg!(debug_assertions) {
return InstanceFlags::debugging();
}
InstanceFlags::empty()
}

Specifically, it seems the InstanceFlags::VALIDATION flag enabled in InstanceFlags::debugging is the culprit:

wgpu/wgpu-types/src/lib.rs

Lines 1038 to 1040 in dfc384a

pub fn debugging() -> Self {
InstanceFlags::DEBUG | InstanceFlags::VALIDATION
}

Thus, the issue can be circumvented downstream by manually passing InstanceFlags::empty() to wgpu::Instance::new—at the expense of validation in debug builds.

Any ideas why validation is way more expensive since 0.20?

@cwfitzgerald
Copy link
Member

I can't check on my phone, but I think that's about when we enabled synchronization validation by default.

@hecrj
Copy link
Contributor

hecrj commented Sep 19, 2024

Can confirm I get ~300 FPS more in the cube example if I comment these lines (enabling PresentMode::Immediate):

// Always enable synchronization validation
validation_feature_list
.push(vk::ValidationFeatureEnableEXT::SYNCHRONIZATION_VALIDATION);

In any case, I have circumvented the problem in iced by introducing a strict-assertions feature flag that can be enabled for internal development; since I see no reason for users of iced to run validation layers unless they are trying to debug a core issue.

This results in an actual speedup on debug builds, after all! Therefore, this issue can be closed on my end.

@Wumpf
Copy link
Member

Wumpf commented Sep 20, 2024

Hmm if it can have such a big impact maybe we shouldn't enable this on all debug builds after all
We could have two levels of validation InstanceFlags::VALIDATION and InstanceFlags::VALIDATION_SLOW and only enable the former in debug builds if at all 🤔

@cwfitzgerald
Copy link
Member

We already do lol, we only turn on gpu based validation if you ask for "advanced_debugging".

Syncval is a lot more useful to us that GBV, which is why we enabled it.

@Wumpf
Copy link
Member

Wumpf commented Sep 20, 2024

well but synchronization validation doesn't fall under gpu based validation, right? So if we were to not enable that by default we'd have to split it out of regular validation. And judging from this ticket we really should strongly consider not to enable this by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vulkan Issues with Vulkan area: performance How fast things go help required We need community help to make this happen.
Projects
Status: In Progress
Development

No branches or pull requests

9 participants