Skip to content

Commit 3c0ee43

Browse files
committed
A bit more text
Clarify role of the barrier within the broader GPU ecosystem. Thanks Daniel for the comments!
1 parent 7dcb8c9 commit 3c0ee43

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

_posts/2023-06-08-shader-converter.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ categories: [gpu]
66
---
77
At WWDC, Apple introduced [Metal shader converter], a tool for converting shaders from DXIL (the main compilation target of HLSL in DirectX12) to Metal. While it is no doubt useful for reducing the cost of porting games from DirectX to Metal, I feel it does not move us any closer to a world of robust GPU infrastructure, and in many ways just adds more underspecified layers of complexity.
88

9-
The specific feature I'm salty about is atomic barriers that allow for some sharing of work between threadgroups. These barriers are present in HLSL, and in fact have been since 2009, when [Direct3D 11] and Shader Model 5 were first introduced.
9+
The specific feature I'm salty about is atomic barriers that allow for some sharing of work between threadgroups. These barriers are present in HLSL, and in fact have been since 2009, when [Direct3D 11] and Shader Model 5 were first introduced. This barrier is not supported in Metal, and of the major GPU APIs, Metal is the only one that doesn't support it. That holds back WebGPU's performance (see [gpuweb#3935 for discussion]), as WebGPU must be portable across the major APIs.
1010

1111
I've discussed the value of this barrier in my blog post [Prefix sum on portable compute shaders], but I'll briefly recap. Among other things, it enables a single-pass implementation of prefix sum, using a technique such as decoupled look-back or the [SAM prefix sum] algorithm. A single-pass implementation can achieve the same throughput as memcpy, while a more traditional tree-reduction approach can at best achieve 2/3 that throughput, as it has to read the entire input in two separate dispatches. Further, tree reduction can actually be more complex to implement in practice, as the number of dispatches varies with the input size (it is typically `2 * ceil(log(n) / log(threadgroup size))`). Prefix sum, in turn is an important primitive for advanced compute workloads. There are a number of instances of it in the [Vello] pipeline, and it's also commonly used in stream compaction, decoding of variable length data streams, and compression.
1212

@@ -18,7 +18,9 @@ The GPU ecosystem exists at the knife edge of being strangled by complexity. A b
1818

1919
The widespread use of shader translation makes the situation even worse. When writing HLSL that will be translated into other shader languages, it's no longer sufficient to consider [Shader Model 5] to be a baseline, but rather the developer needs to keep in mind all the features that don't translate to other languages. In some cases, the semantics change subtly (the rules for the various flavors "count leading zeros" when the input is 0 vary), and in other cases, like these device scoped barriers.
2020

21-
A separate category is things technically forbidden by the spec, but expected to work in practice. A good example here is the mixing of atomic and non-atomic memory operations (see gpuweb#2229). The spirv-cross shader translation tool casts non-atomic pointers to atomic pointers to support this common pattern, which is technically undefined behavior in C++, but in practice lots of people would be unhappy if the Metal shader compiler did anything other than the reasonable thing. Since Metal's semantics are based on C++, I'd personally love to see this resolved by adopting std::atomic_ref from C++20 (Metal is still based on C++14). I'll also not that the official Metal shader compiler tool generates [reasonable IR] for this pattern. It's concerning that using open source tools such as spirv-cross triggers technical undefined behavior, but it's probably not a big problem in practice.
21+
A separate category is things technically forbidden by the spec, but expected to work in practice. A good example here is the mixing of atomic and non-atomic memory operations (see gpuweb#2229). The spirv-cross shader translation tool casts non-atomic pointers to atomic pointers to support this common pattern, which is technically undefined behavior in C++, but in practice lots of people would be unhappy if the Metal shader compiler did anything other than the reasonable thing. Since Metal's semantics are based on C++, I'd personally love to see this resolved by adopting std::atomic_ref from C++20 (Metal is still based on C++14). I'll also note that the official Metal shader compiler tool generates [reasonable IR] for this pattern. It's concerning that using open source tools such as spirv-cross triggers technical undefined behavior, but it's probably not a big problem in practice.
22+
23+
I understand the incentives, but overall I find it disappointing that Metal chases shiny new features like ray-tracing, while failing to provide a solid, spec-compliant foundation for GPU compute.
2224

2325
## Onward
2426

@@ -36,12 +38,13 @@ I have a recommendations for Apple as well. I hope that they document which HLSL
3638
[CDSChecker]: http://plrg.eecs.uci.edu/software_page/42-2/
3739
[loom]: https://github.com/tokio-rs/loom
3840
[OpenGL 3.1 support]: https://asahilinux.org/2023/06/opengl-3-1-on-asahi-linux/
41+
[gpuweb#2229]: https://github.com/gpuweb/gpuweb/issues/2229
3942
[gpuweb#2297]: https://github.com/gpuweb/gpuweb/pull/2297
43+
[gpuweb#3935]: https://github.com/gpuweb/gpuweb/discussions/3935
4044
[Metal Shading Language Specification]: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
4145
[SAM prefix sum]: https://dl.acm.org/doi/10.1145/2980983.2908089
4246
[Vello]: https://github.com/linebender/vello
4347
[extension list at vulkan.gpuinfo.org]: https://vulkan.gpuinfo.org/listfeaturesextensions.php
4448
[Shader Model 5]: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/d3d11-graphics-reference-sm5
45-
[ghpuweb#2229]: https://github.com/gpuweb/gpuweb/issues/2229
4649
[std::atomic_ref]: https://en.cppreference.com/w/cpp/atomic/atomic_ref
4750
[reasonable IR]: https://gist.github.com/raphlinus/a8e0a3a3683127149b746eb37822bdc8

0 commit comments

Comments
 (0)