Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #18904.
See also:
&&
and||
gfx-rs/wgpu#4394A Debugging Story
In the above issue, the defect can be reproduced by running the
3d_scene
example with an Intel adapter on Vulkan and observing the following output:1. Indirect Draw Args
The first thing to check is what's being passed to
vkCmdDrawIndexedIndirectCount
inmain_opaque_pass_3d
. What we see is a little bizarre:We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their
first_instance
id.2. Reversed work item ordering
Trying to debug what could possibly be wrong with the shader input lead to observing the
PreprocessWorkItem
buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break:This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue.
3. Memory ordering?
My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write:
Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug?
4. SPIR-V de-compilation
This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following:
This looks wrong. The final condition
_247 || _251
doesn't seem right. Checking and confirming,||
in WGSL is supposed to short circuit. Instead, here, we unconditionally read from&_221[_248].output_or_indirect_parameters_index;
AKAwork_items[instance_index - 1]
. In the eventinstance_index
is 0 that means we are OOB. Uh oh!!5. Vendor differences
I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug:
On the first thread in the workgroup where
instance_index
is 0, we will always OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. However, in the event that the first work item's input/output index is supposed to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes!The fix is to make sure that we only read from the prior input when we are not the first item.