Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect looping in shader on metal #5067

Open
pieper opened this issue Jan 15, 2024 · 12 comments
Open

Incorrect looping in shader on metal #5067

pieper opened this issue Jan 15, 2024 · 12 comments
Labels
api: metal Issues with Metal external: driver-bug A driver is causing the bug, though we may still want to work around it lang: Metal Metal Shading Language

Comments

@pieper
Copy link

pieper commented Jan 15, 2024

Description
A triple-nested for loop behaves differently on a mac pro with AMD cards than it does on other platforms.

Repro steps

On most systems it runs normally (tested window and linux with nvidia GPUs and mac with M2 GPU).

On a mac pro with AMD Radeon Pro W5700X 16 GB, macOS 13.4, the error condition happen.

On the AMD mac, the max of neighborsVisited is 4 when it is 8 everywhere else. Also the value of ii is zero, not 2.

fn main(@builtin(global_invocation_id) id: vec3<u32>) {

    let idi32 : vec3<i32> = vec3<i32>(id);
    var dimensions : vec3<i32> = vec3<i32>(3,3,3);
    var kk : i32; var jj : i32; var ii : i32;
    var neighborsVisited : i32 = 0;
    for (kk = -1; kk < 2; kk += 1) {
        for (jj = -1; jj < 2; jj += 1) {
            for (ii = -1; ii < 2; ii += 1) {
                if ( (kk != 0 && jj != 0 && ii != 0)
                      && ((idi32.z + kk) >= 0 && (idi32.z + kk) < dimensions.z)
                      && ((idi32.y + jj) >= 0 && (idi32.y + jj) < dimensions.y)
                      && ((idi32.x + ii) >= 0 && (idi32.x + ii) < dimensions.x) ) {
                    neighborsVisited += 1;
                }
            }
        }
    }
    displacements[idi32.x + idi32.y * 3 + idi32.z * 9] = vec4<f32>(f32(kk), f32(jj), f32(ii), f32(neighborsVisited));
}

Expected vs observed behavior

The code should work the same everywhere : )

Extra materials

A full wgpu python script that demonstrates the issue is here.

Platform

AMD Radeon Pro W5700X 16 GB, macOS 13.4

wgpu '0.13.2'

@cwfitzgerald
Copy link
Member

Would it be possible to test this with a newer wgpu (ideally trunk)? A lot has changed since 0.13

@pieper
Copy link
Author

pieper commented Jan 16, 2024

I just tried again with the current git main and the dev instructions and I get the same result.

@teoxoy
Copy link
Member

teoxoy commented Jan 18, 2024

This sounds like a driver bug but could be #4972 as well since we desugar all loop types to while(true){} loops.

@teoxoy teoxoy added type: bug Something isn't working api: metal Issues with Metal lang: Metal Metal Shading Language labels Jan 18, 2024
@teoxoy
Copy link
Member

teoxoy commented Jan 18, 2024

@pieper could you try the commit in this PR #5089?

@pieper
Copy link
Author

pieper commented Jan 18, 2024

@pieper could you try the commit in this PR #5089?

Sorry, I'm afraid I don't know exactly how to do that - I thought it would require building a new libwgpu_native.dylib just changing the wgpu-native Cargo.tml file to point to your branch instead of the gfw-rs repository, but I get some kind of naga dependency error that I don't understand.

Is there an example of building with a branch? (the gfw-rs/wgpu-native trunk builds fine).

@teoxoy
Copy link
Member

teoxoy commented Jan 19, 2024

but I get some kind of naga dependency error that I don't understand

It could be because the API changed in a non-backwards compatible way since my PR is based on a recent version of trunk.

I think the change can be applied cleanly to all recent versions of naga.

https://github.com/gfx-rs/wgpu/pull/5089/files#diff-157fdc1c8dfcf27ae4346bba3b46cd60abdb5bc2d9a49ba1894597850439d36f

You can clone the specific revision of naga you are currently using and apply the changes locally; then point wgpu-native to your local repo.

@pieper
Copy link
Author

pieper commented Jan 20, 2024

@teoxoy would you be able to make a branch of wgpu-native that builds against the versions you described? I'm not really familiar with components of this project or the rust build systems so it's not clear to me how to proceed.

@teoxoy
Copy link
Member

teoxoy commented Jan 22, 2024

@pieper see: https://github.com/teoxoy/wgpu-native/tree/msl-asm-loop

@pieper
Copy link
Author

pieper commented Jan 22, 2024

Thank you @teoxoy , I tried your branch and it built but gave the same results as before.

I agree it might be a driver bug, but I've never seen any other odd behavior from this machine and I use if every day for various purposes.

Unless people have other suggestions I think I'll make a PR to add this snippet to the wgpu-native compute shader test.

@teoxoy teoxoy added external: driver-bug A driver is causing the bug, though we may still want to work around it and removed type: bug Something isn't working labels Jan 23, 2024
@pieper
Copy link
Author

pieper commented Jan 23, 2024

Is there a dashboard where tests are recorded and displayed for different platforms? I can test on this macos / driver version for now but it may get updated in the future.

@cwfitzgerald
Copy link
Member

We currently run the tests in CI on all backends so are currently relying on that. All the tests should be clean, at they automatically manage their own expectations (i.e. A expected failure succeeds)

@cwfitzgerald
Copy link
Member

The best place to put it would be honestly in a test in this repository, that way it gets actively tested

pieper added a commit to pieper/wgpu that referenced this issue Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: metal Issues with Metal external: driver-bug A driver is causing the bug, though we may still want to work around it lang: Metal Metal Shading Language
Projects
None yet
Development

No branches or pull requests

3 participants