-
Notifications
You must be signed in to change notification settings - Fork 935
Fix OpStore of BDA to align to 8 #4123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix OpStore of BDA to align to 8 #4123
Conversation
| 280: 6(ptr) CompositeExtract 277 0 | ||
| 282: 281(ptr) AccessChain 279 35 | ||
| Store 282 280 Aligned 16 | ||
| Store 282 280 Aligned 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is
layout(buffer_reference) buffer MyBuffer;
struct S6 { MyBuffer x; };
layout(set = 0, binding = 1) restrict readonly buffer SSBO {
S6 s6;
} InBuffer;
layout(buffer_reference) buffer MyBuffer {
S6 s6;
};
void main() {
MyBuffer payload;
payload.s6 = InBuffer.s6;
}which is storing a BDA pointer, but it was defaulting back to 16 before, so this is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should still be 16 I believe. The default value for buffer_reference_align when unspecified is 16 and the first member of the implicit buffer struct can be proved to have the same alignment as the struct itself.
Other members may have smaller alignments depending on their offset in the struct (specified by the selected layout rule) though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the correct formula to compute the highest guaranteed alignment for each member of a struct is gcd(struct_alignment, member_offset_in_the_struct) where gcd is the greatest common divisor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alignments have to be powers of two so you don't need gcd, you can just use max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean max(struct_alignment, member_offset_in_the_struct)?
This does not work:
layout(buffer_reference, scalar) buffer Data
{
uint64_t data0; // offset 0
uint64_t data1; // offset 8
uint64_t data2; // offset 16
uint64_t data3; // offset 24
};Correct:
gcd(16, 0) = 16
gcd(16, 8) = 8
gcd(16, 16) = 16
gcd(16, 24) = 8
Incorrect:
max(16, 0) = 16
max(16, 8) = 16
max(16, 16) = 16
max(16, 24) = 24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, looked back into this, the issue is actually not the logic I added, it is doing things correctly IMO
The "real" issue here is this GLSL is generating code to copy the pointer, not the struct, but only because this test is --target-env vulkan1.1
If you set it to --target-env vulkan1.2 or higher, it will generate the correct SPIR-V such that the alignment is 8 as you would think
https://godbolt.org/z/d7n86soYd
So I agree there is an issue, but only for 1.1 or lower GLSL code path and its because it produces code that is copying a pointer, which should be aligned 8... trying to say "this PR is doing the logic correctly and there is nothing in the scope here to fix, but can raise a new issue to track the issue"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pointer being 8 bytes, it indeed requires an alignment of at least 8 bytes.
However, OpLoad instructions should always use the largest alignment possible that can be compile-time-proved, for performance reasons (see Nvidia's Advanced API Performance: Descriptors blog).
From my understanding, this is the whole point of the existence of buffer_reference_align instead of auto-computing the alignment from the largest member.
In this example, MyBuffer does not set buffer_reference_align, which means the default value of 16 is used, as explained by GL_EXT_buffer_reference:
Each buffer reference type has an alignment that can be specified via
the "buffer_reference_align" layout qualifier. This must be a power of
two and be greater than or equal to the largest scalar/component type
in the block. If the layout qualifier is not specified, it defaults to
16 bytes.
Since the first member, s6, is at offset 0, we can prove that it has the same guaranteed alignment as the implicit struct (16) and thus should have loads/stores aligned to 16 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thus should have loads/stores aligned to 16 bytes.
I agree 100% with you that it should be 16
I disagree there is anything to change in this PR, this PR is exposing a bug that is there already, which is --target-env vulkan1.1 is not producing correct BDA spirv, regardless of the Aligned value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open another issue about buffer_reference_align being ignored for loads/stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxime-modulopi thanks, clearly there are still more things to get right, but happy to get it fixed little by little and issues with small code to reproduce is always helpful!
Tracks if OpStore is storing a buffer_reference pointer and if so, sets the alignment to be 8 like it does currently for loads
2e8228d to
7cd236f
Compare
|
I've compiled the branch and while it now works for standard variables, assigning entire arrays of pointers still align the load and store to 4. Reproduction code#version 460 core
#extension GL_EXT_buffer_reference: require
#extension GL_EXT_scalar_block_layout: require
layout (buffer_reference, buffer_reference_align = 4, scalar) readonly buffer DataBuffer
{
vec4 data;
};
layout (buffer_reference, buffer_reference_align = 8, scalar) buffer ParametersBuffer
{
DataBuffer src[2];
DataBuffer dst[2];
};
layout(push_constant) uniform constants
{
ParametersBuffer u_param;
};
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main()
{
u_param.dst = u_param.src;
} |
|
Also, there is still this weird behavior where changing DataBuffer's This may be the root of the issue. |
Actually closes #4058
cc @arcady-lunarg