-
Notifications
You must be signed in to change notification settings - Fork 787
[SM6.9] Enable trivial native vector Dxil Operations plus a few #7324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This enables the generation of native vector DXIL Operations that are "trivial", meaning they take only a single DXOp Call instruction to implement as well as a few others that either only took such a call and some llvm operations or were of particular interest for other reasons. This involves allowing the overloads by adding the vector indication in hctdb, altering the lowering to maintain the vectors instead of scalarizing them, and a few sundry changes to fix issues along the way. The "trivial" dxil operations that return a different value from the overload type had to be moved out of the way and given their own lowering function so that the main function could generate vectors conditional on the version and vector type. These will be added in a later change. While the long vector supporting intrinsics that weren't given this treatment will continue to generate scalarized operations, some of them needed some work as well. The dot product for float vectors longer than 4 had to take the integer fallback path, which required some small modificaitons and a rename. Additionally, a heuristic for pow that malfunctioned with too many elements had to have a limit placed on it. Since the or()/and()/select() intrinsics translate directly to LLVM ops, they can have their lowering scalarization removed and what future scalarization might be needed by the current version can be done by later passes as with other LLVM operators. An issue with a special value used to represent unassined dimensions had to be addressed since new dimensions can exceed that value. It's now MAX_INT. Contributes to microsoft#7120, but I'd prefer to leave it open until all intrinsics are covered
Was using int dot for the float operands as it was originally an int-only lowering function.
✅ With the latest revision this PR passed the C/C++ code formatter. |
tools/clang/test/CodeGenDXIL/hlsl/types/longvec-scalarized-intrinsics.hlsl
Outdated
Show resolved
Hide resolved
StructuredBuffer< vector<TYPE, 8> > buf; | ||
ByteAddressBuffer rbuf; | ||
|
||
float4 main(uint i : SV_PrimitiveID, uint4 m : M) : SV_Target { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alsepkow - the approach used in this file might be interesting to think about for the HLKs you're working on
../tools/clang/test/CodeGenDXIL/hlsl/types/longvec-intrinsics.hlsl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- Some intrinsic expansions aren't using vector DXIL ops when they are available - I think they should.
- I think whether vectors are supported by an operation can be determined deeper in common
TrivialDxilOperation
code based on the operation, rather than by callers of each of theTrivialDxil*Operation
functions. This will also likely fix the prior issue with expansions, while reducing the number of changes. - This reminded me that we don't have tests for literal vector DXIL op evaluation. Did I forget about this testing in an earlier change, or am I missing something about how those could be handled automatically?
- I think we had a static method on
hlsl::OP
which queried specifically for vector overload support for a particular DXIL operation (and probably specific overload dimension index, defaulting to 0). Right now, you'd have to rely onIsOverloadLegal
and supply the full (multi-)overload type. - All the tests for the shader model version to determine whether vector dxil is supported makes me wish there was a specific accessor for this property on
DxilModule
andHLModule
instead (plus one onhlsl::OP
would help).
// CHECK: call float @dx.op.unary.f32(i32 17, float %{{.*}}) ; Atan(value) | ||
// CHECK: call float @dx.op.unary.f32(i32 17, float %{{.*}}) ; Atan(value) | ||
// CHECK: call float @dx.op.unary.f32(i32 17, float %{{.*}}) ; Atan(value) | ||
// CHECK: call float @dx.op.unary.f32(i32 17, float %{{.*}}) ; Atan(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't these use the supported vector overloads for expansions?
I think it would have made sense for each of these operations with corresponding vector DXIL ops:
- atan2 (Atan)
- fmod (FAbs, Frc)
- ldexp (Exp)
- pow (Log, Exp)
- modf (Round_z)
lib/HLSL/HLOperationLower.cpp
Outdated
|
||
// If supported and the overload type is a vector with more than 1 element, | ||
// create a native vector operation. | ||
if (SupportsVectors && Ty->isVectorTy() && Ty->getVectorNumElements() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the caller supplying SupportsVectors
, couldn't this be looked up by the operation and the module?
if (SupportsVectors && Ty->isVectorTy() && Ty->getVectorNumElements() > 1) { | |
if (Ty->isVectorTy() && Ty->getVectorNumElements() > 1 && | |
hlslOP->GetModule()->GetHLModule()->GetShaderModel()->IsSM69Plus() && | |
OP::IsOverloadLegal(opcode, Ty)) { |
The only iffy one here is using OP::IsOverloadLegal
as opposed to a more specific accessor to see if the operation supports native vector types. I did that because that accessor doesn't currently exist, but it would make sense to add it for this purpose.
There's only one oddity in using OP::IsOverloadLegal
instead of a more specific check for vector support that I can think of at the moment:
The weird corner case where we still rely on illegal double overloads for literal evaluation. It would reject the vector path for an unsupported literal (double) vector and scalarize the DXIL operation, which would be evaluated and folded later - so scalarization in that case wouldn't do any harm. In fact, I don't think we have any testing for literal DXIL op evaluation with native vector overloads (do we even handle that?!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch @tex3d!
Fixing this actually corrects at least some of the missing intrinsics you cited in your comment above. I'll roll this change through and do a pass at removing spurious diffs too. I think this is a fantastic simplification of the change.
This also results in correctly disabling scalarization for a number of intrinsics too. ../tools/clang/test/CodeGenDXIL/hlsl/types/longvec-scalarized-intrinsics .hlsl ../tools/clang/test/CodeGenDXIL/hlsl/types/longvec-trivial-scalarized-in trinsics.hlsl
After adapting to @tex3d's feedback this really simplified and cleaned up a lot. Please let me know if there are other changes I should make, but I think I got everything that has been suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, LGTM.
I'm looking at the test failure now. Looks like one of my commits from today is causing a failure in int16 operation lowering. Will fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Given that only WaveMatch
is truly "scalarized", the name of the test isn't great. Perhaps "expanded" instead. But I don't think this justifies an update on its own. This can be cleaned up later.
This enables the generation of native vector DXIL Operations
that are "trivial", meaning they take only a single DXOp Call
instruction to implement as well as a few others that either only took
such a call and some llvm operations or were of particular interest for
other reasons.
This involves allowing the overloads by adding the vector indication in
hctdb, altering the lowering to maintain the vectors instead of
scalarizing them, and a few sundry changes to fix issues along the way.
The "trivial" dxil operations that return a different value from the
overload type had to be moved out of the way and given their own
lowering function so that the main function could generate vectors
conditional on the version and vector type. These will be added in a
later change.
While the long vector supporting intrinsics that weren't given this
treatment will continue to generate scalarized operations, some of them
needed some work as well. The dot product for float vectors longer than
4 had to take the integer fallback path, which required some small
modificaitons and a rename.
Additionally, a heuristic for pow that malfunctioned with too many
elements had to have a limit placed on it.
Since the or()/and()/select() intrinsics translate directly to LLVM ops,
they can have their lowering scalarization removed and what future
scalarization might be needed by the current version can be done by
later passes as with other LLVM operators.
An issue with a special value used to represent unassined dimensions had
to be addressed since new dimensions can exceed that value. It's now
MAX_INT.
Contributes to #7120, but I'd prefer to leave it open until all
intrinsics are covered
Primary work by @pow2clk
Fixes #7297 & #7120