-
Notifications
You must be signed in to change notification settings - Fork 17
Explicit Padding in CBuffers Proposal #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,191 @@ | ||
<!-- {% raw %} --> | ||
|
||
# Explicit Padding in Structs and CBuffer Arrays | ||
|
||
* Proposal: [NNNN](NNNN-explicit-padding.md) | ||
* Author(s): [Justin Bogner](https://github.com/bogner) | ||
* Status: **Design In Progress** | ||
|
||
## Introduction | ||
|
||
We introduce an explicit padding type for HLSL, and construct cbuffer arrays | ||
and structs that are annotated with `packoffset` or `vk::offset` using this | ||
type to unambiguously lay out these objects. | ||
|
||
## Motivation | ||
|
||
HLSL has a few contexts where we have types with a layout that doesn't match | ||
the usual rules that follow from C++ definitions and targets' data layouts. We | ||
can generally describe the appropriate type representations for these using | ||
explicit padding, but some care needs to be taken: | ||
|
||
- Arrays in CBuffers may have padding in between members, but crucially they do | ||
not have padding after the last member. This needs special handling to | ||
represent. | ||
|
||
- There are two HLSL constructs that may introduce arbitrary padding to a | ||
struct. In a cbuffer, the `packoffset` attribute specifies the offset of a | ||
member, and outside of cbuffers, the `vk::offset` vulkan attribute may do the | ||
same. | ||
|
||
- Simply padding structures with `i8` as is typical with ABI-related padding | ||
makes it difficult to recover which struct elements are padding vs which are | ||
subobjects. This matters in some backends, and is specifically important for | ||
SPIR-V where we need to map a logical indices into the struct into physical | ||
offsets. | ||
|
||
## Proposed solution | ||
|
||
Introduce an explicit padding type and use this type for the padding in the | ||
various constructs that need it. | ||
|
||
The padding type will be defined as one of the following: | ||
|
||
- A first class LLVM type called `pad8`, which is equivalent but distinct from | ||
`i8`. This would need an RFC to the wider LLVM community and would need to be | ||
useful in other contexts (such as ABI-mandated padding). | ||
- A well-known named type `%pad8`, defined as a named struct containing a | ||
single `i8`. This is the simplest option but requires backends that are | ||
interested in this type to participate in a secret handshake. | ||
- Target types such as `target("dx.pad8")` and `target("spirv.pad8")`. This is | ||
somewhat awkward because the type isn't really tied to a target, but target | ||
types need to be. Targets that don't need to differentiate between padding | ||
and actual members could simply use `i8`. | ||
|
||
> TODO: Choose one of these three options and move the others to the | ||
> "alternatives" section. | ||
Comment on lines
+42
to
+56
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the main unanswered question that I want feedback on here. I'm leaning towards the simple well-known name approach for its simplicity, with the option of pushing for a first class type in the future if this proves useful otherwise. The downside, of course, is that if there were a name collision with some other type very bad things would happen. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My thoughts are to do the target type for now. See if the LLVM community is interested in I do not have strong opinions on this, and I will not hold up this proposal if you do something different. |
||
|
||
### Structs with annotations | ||
|
||
Generally speaking HLSL structs are equivalent to packed structs in C++. We can | ||
simply add padding between members as appropriate in order to satisfy the rules | ||
specified by `packoffset`, `vk::offset`, or otherwise by HLSL semantics. | ||
|
||
There is one complicating factor here. Both `dxc` and `fxc` allow `packoffset` | ||
to be used to layout a struct in an order that does not match the | ||
lexicographical order: | ||
|
||
```hlsl | ||
cbuffer cb0 { | ||
int x : packoffset(c0.y); | ||
float y : packoffset(c0.x); | ||
} | ||
``` | ||
|
||
To support this, we need to create the underlying LLVM type in the order that | ||
matches the packoffsets rather than the order as written, so we would end up | ||
with `{ float, i32 }` here, losing the lexicographical order. This is probably | ||
okay since we need to create artificial types for cbuffers anyway (such as when | ||
we filter out resource types that are declared within the cbuffer), but may not | ||
make for a particularly good debugging experience. | ||
|
||
### Arrays in a cbuffer | ||
|
||
CBuffers in HLSL have very specific layout rules. Each "object" starts at a | ||
16-byte boundary, which is mostly explainable as a 16-byte alignment | ||
requirement, but applies to array elements rather than the array itself in a | ||
way that doesn't match the general language. We can emulate this with an array | ||
of objects that consist of a struct containing the element type and padding to | ||
16 bytes, followed by a single instance of the element type itself. See | ||
[CBuffer Padded arrays at the HLSL-level] for details. | ||
|
||
[CBuffer Padded arrays at the HLSL-level]: #cbuffer-padded-arrays-at-the-hlsl-level | ||
|
||
## Detailed design | ||
|
||
### CBuffer representation at the LLVM level | ||
|
||
CBuffers will continue to use a [__cblayout] type, but will no longer use a | ||
`target("dx.Layout", ...)` type. | ||
|
||
When using `packoffset`, we'll add explicit padding as necessary. Consider | ||
`cb0`: | ||
|
||
```hlsl | ||
cbuffer cb0 : register(b0) { | ||
int x : packoffset(c0.y); | ||
float y : packoffset(c1.z); | ||
} | ||
``` | ||
|
||
```llvm | ||
%__cblayout_cb0 = type <{ | ||
[4 x %pad8], | ||
i32, | ||
[16 x %pad8], | ||
float | ||
}> | ||
``` | ||
|
||
For arrays, we'll have padding within elements to fill to a 16-byte boundary, | ||
and padding before arrays in order for them to start at 16-byte boundaries. | ||
Consider `cb1`: | ||
|
||
```hlsl | ||
cbuffer cb1 : register(b0) { | ||
float a1[3]; // offset 0, size 4 (+12) * 3 | ||
double3 a2[2]; // offset 48, size 24 (+8) * 2 | ||
uint4 a3[2]; // offset 112, size 16 * 2 | ||
float16_t a4[2][2]; // offset 144, size 2 (+14) * 4 | ||
} | ||
``` | ||
|
||
```llvm | ||
%__cblayout_cb1 = type <{ | ||
<{ [2 x <{ float, [12 x %pad8] }>], float }>, [12 x %pad8], | ||
<{ [1 x <{ <3 x double>, [8 x %pad8] }>], <3 x double> }>, [8 x %pad8], | ||
[ 2 x <4 x i32> ], | ||
<{ [3 x <{ half, [14 x %pad8] }>], half }> | ||
}> | ||
``` | ||
|
||
[__cblayout]: https://github.com/llvm/wg-hlsl/blob/4570a9cfc5c4b1e5bc0b773a6fb7b22014ac6d3b/proposals/0016-constant-buffers.md#lowering-constant-buffer-resources-to-llvm-ir "Lowering Constant Buffer Resources to LLVM IR" | ||
|
||
### CBuffer Padded arrays at the HLSL-level | ||
|
||
Arrays in cbuffers need padding between elements if the element size is not a | ||
multiple of 16 bytes. This can be implemented as if these were objects of a C++ | ||
type like the following rather than simple arrays: | ||
|
||
```c++ | ||
#include <cstdint> | ||
#include <type_traits> | ||
|
||
using pad8_t = uint8_t; | ||
|
||
template <typename T, std::size_t N, bool NeedsPadding = sizeof(T) % 16 != 0> | ||
struct CBufArray; | ||
|
||
template <typename T, std::size_t N> struct CBufArray<T, N, true> { | ||
struct PaddedT { | ||
T Element; | ||
uint8_t Padding[16 - (sizeof(T) % 16)]; | ||
}; | ||
PaddedT Elems[N - 1]; | ||
T LastElem; | ||
|
||
const T &operator[](std::size_t I) const { | ||
return I == N - 1 ? LastElem : Elems[I].Element; | ||
} | ||
}; | ||
|
||
template <typename T, std::size_t N> struct CBufArray<T, N, false> { | ||
T Elems[N]; | ||
|
||
const T &operator[](std::size_t I) const { return Elems[I]; } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Base on the way this is written, if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think one of the complications you end up with if we try to represent this in the AST is that we need a type trait that produces the cbuffer layout types effectively recursively. Because you may have something like:
The cbuffer layout struct is effectively:
I don't think this is impossible to deal with, but if we do represent this in the AST we'll also need to think about how we handle conversions. We may also need to massage the diagnostics for the inverse case because while we won't need to support converting a value of a non-cbuffer layout type to the cbuffer type since cbuffers are read-only, we really won't want the diagnostics to refer to the cbuffer types directly. |
||
}; | ||
``` | ||
|
||
We won't actually implement this type in HLSL, but we do need to model arrays | ||
in cbuffers equivalently to this in the clang ASTs. This has to be done in the | ||
AST and not later during clang codegen because offsets into arrays are | ||
calculated in various places based off of the AST types. | ||
|
||
## Alternatives considered | ||
|
||
See [llvm-project/wg-hlsl#171] for the previous attempt at representing these | ||
types. | ||
|
||
[llvm-project/wg-hlsl#171]: https://github.com/llvm/wg-hlsl/pull/171 | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the one other alternative to consider is a hybrid, where we create the layout types in the AST, but don't actually have the cbuffer members be of the layout types. That would avoid needing to have special casting behavior for cbuffer types. We could insert the "conversion" code late in CodeGen based of the address space of the pointer being loaded. I'm not sure if this actually simplifies things or not. DXC does a bunch of things in CodeGen that shouldn't be done there because it adds data type conversions that actually change values, but in this case these conversions aren't really "type" conversions as much as layout conversions, so I feel less icky about doing them in CodeGen and not fully representing them in the AST. Curious for thoughts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My uninform thoughts are that it could work. It is worth checking out. Somewhere in clang, we have to handle conversions. I just don't know the best place. Also note that conversion will have to be done in such a way that they do not cause too much code, and they can be optimized aways. See a recent issue we fixed for SPIR-V: microsoft/DirectXShaderCompiler#7493. Their code copies the entirety of a large cbuffer to return it by value. The expectation is that the optimizer is able copy propagate everything and only load the values that are actually used. |
||
<!-- {% endraw %} --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think about what could happen with SPIR-V if we do not identify the padding and remove it.
I think that 1 is essential, and 2 is a very nice to have.