Skip to content

[SM6.9] Enable trivial native vector Dxil Operations plus a few #7324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 9, 2025

Conversation

llvm-beanz
Copy link
Collaborator

@llvm-beanz llvm-beanz commented Apr 8, 2025

This enables the generation of native vector DXIL Operations
that are "trivial", meaning they take only a single DXOp Call
instruction to implement as well as a few others that either only took
such a call and some llvm operations or were of particular interest for
other reasons.

This involves allowing the overloads by adding the vector indication in
hctdb, altering the lowering to maintain the vectors instead of
scalarizing them, and a few sundry changes to fix issues along the way.

The "trivial" dxil operations that return a different value from the
overload type had to be moved out of the way and given their own
lowering function so that the main function could generate vectors
conditional on the version and vector type. These will be added in a
later change.

While the long vector supporting intrinsics that weren't given this
treatment will continue to generate scalarized operations, some of them
needed some work as well. The dot product for float vectors longer than
4 had to take the integer fallback path, which required some small
modificaitons and a rename.
Additionally, a heuristic for pow that malfunctioned with too many
elements had to have a limit placed on it.

Since the or()/and()/select() intrinsics translate directly to LLVM ops,
they can have their lowering scalarization removed and what future
scalarization might be needed by the current version can be done by
later passes as with other LLVM operators.

An issue with a special value used to represent unassined dimensions had
to be addressed since new dimensions can exceed that value. It's now
MAX_INT.

Contributes to #7120, but I'd prefer to leave it open until all
intrinsics are covered

Primary work by @pow2clk

Fixes #7297 & #7120

Greg Roth and others added 7 commits April 8, 2025 11:40
This enables the generation of native vector DXIL Operations
that are "trivial", meaning they take only a single DXOp Call
instruction to implement as well as a few others that either only took
such a call and some llvm operations or were of particular interest for
other reasons.

This involves allowing the overloads by adding the vector indication in
hctdb, altering the lowering to maintain the vectors instead of
scalarizing them, and a few sundry changes to fix issues along the way.

The "trivial" dxil operations that return a different value from the
overload type had to be moved out of the way and given their own
lowering function so that the main function could generate vectors
conditional on the version and vector type. These will be added in a
later change.

While the long vector supporting intrinsics that weren't given this
treatment will continue to generate scalarized operations, some of them
needed some work as well. The dot product for float vectors longer than
4 had to take the integer fallback path, which required some small
modificaitons and a rename.
Additionally, a heuristic for pow that malfunctioned with too many
elements had to have a limit placed on it.

Since the or()/and()/select() intrinsics translate directly to LLVM ops,
they can have their lowering scalarization removed and what future
scalarization might be needed by the current version can be done by
later passes as with other LLVM operators.

An issue with a special value used to represent unassined dimensions had
to be addressed since new dimensions can exceed that value. It's now
MAX_INT.

Contributes to microsoft#7120, but I'd prefer to leave it open until all
intrinsics are covered
Was using int dot for the float operands as it was originally an
int-only lowering function.
@llvm-beanz llvm-beanz requested a review from tex3d April 8, 2025 16:56
@llvm-beanz llvm-beanz changed the title Enable trivial native vector Dxil Operations plus a few [SM6.9] Enable trivial native vector Dxil Operations plus a few Apr 8, 2025
Copy link
Contributor

github-actions bot commented Apr 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@damyanp damyanp moved this to Needs Review in HLSL Support Apr 8, 2025
StructuredBuffer< vector<TYPE, 8> > buf;
ByteAddressBuffer rbuf;

float4 main(uint i : SV_PrimitiveID, uint4 m : M) : SV_Target {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alsepkow - the approach used in this file might be interesting to think about for the HLKs you're working on

Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • Some intrinsic expansions aren't using vector DXIL ops when they are available - I think they should.
  • I think whether vectors are supported by an operation can be determined deeper in common TrivialDxilOperation code based on the operation, rather than by callers of each of the TrivialDxil*Operation functions. This will also likely fix the prior issue with expansions, while reducing the number of changes.
  • This reminded me that we don't have tests for literal vector DXIL op evaluation. Did I forget about this testing in an earlier change, or am I missing something about how those could be handled automatically?
  • I think we had a static method on hlsl::OP which queried specifically for vector overload support for a particular DXIL operation (and probably specific overload dimension index, defaulting to 0). Right now, you'd have to rely on IsOverloadLegal and supply the full (multi-)overload type.
  • All the tests for the shader model version to determine whether vector dxil is supported makes me wish there was a specific accessor for this property on DxilModule and HLModule instead (plus one on hlsl::OP would help).

// CHECK: call float @dx.op.unary.f32(i32 17, float %{{.*}}) ; Atan(value)
// CHECK: call float @dx.op.unary.f32(i32 17, float %{{.*}}) ; Atan(value)
// CHECK: call float @dx.op.unary.f32(i32 17, float %{{.*}}) ; Atan(value)
// CHECK: call float @dx.op.unary.f32(i32 17, float %{{.*}}) ; Atan(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't these use the supported vector overloads for expansions?
I think it would have made sense for each of these operations with corresponding vector DXIL ops:

  • atan2 (Atan)
  • fmod (FAbs, Frc)
  • ldexp (Exp)
  • pow (Log, Exp)
  • modf (Round_z)


// If supported and the overload type is a vector with more than 1 element,
// create a native vector operation.
if (SupportsVectors && Ty->isVectorTy() && Ty->getVectorNumElements() > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the caller supplying SupportsVectors, couldn't this be looked up by the operation and the module?

Suggested change
if (SupportsVectors && Ty->isVectorTy() && Ty->getVectorNumElements() > 1) {
if (Ty->isVectorTy() && Ty->getVectorNumElements() > 1 &&
hlslOP->GetModule()->GetHLModule()->GetShaderModel()->IsSM69Plus() &&
OP::IsOverloadLegal(opcode, Ty)) {

The only iffy one here is using OP::IsOverloadLegal as opposed to a more specific accessor to see if the operation supports native vector types. I did that because that accessor doesn't currently exist, but it would make sense to add it for this purpose.

There's only one oddity in using OP::IsOverloadLegal instead of a more specific check for vector support that I can think of at the moment:
The weird corner case where we still rely on illegal double overloads for literal evaluation. It would reject the vector path for an unsupported literal (double) vector and scalarize the DXIL operation, which would be evaluated and folded later - so scalarization in that case wouldn't do any harm. In fact, I don't think we have any testing for literal DXIL op evaluation with native vector overloads (do we even handle that?!).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch @tex3d!

Fixing this actually corrects at least some of the missing intrinsics you cited in your comment above. I'll roll this change through and do a pass at removing spurious diffs too. I think this is a fantastic simplification of the change.

This also results in correctly disabling scalarization for a number of
intrinsics too.
../tools/clang/test/CodeGenDXIL/hlsl/types/longvec-scalarized-intrinsics
.hlsl
../tools/clang/test/CodeGenDXIL/hlsl/types/longvec-trivial-scalarized-in
trinsics.hlsl
@llvm-beanz
Copy link
Collaborator Author

After adapting to @tex3d's feedback this really simplified and cleaned up a lot. Please let me know if there are other changes I should make, but I think I got everything that has been suggested.

damyanp
damyanp previously approved these changes Apr 8, 2025
Copy link
Member

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, LGTM.

@llvm-beanz
Copy link
Collaborator Author

I'm looking at the test failure now. Looks like one of my commits from today is causing a failure in int16 operation lowering. Will fix.

Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Given that only WaveMatch is truly "scalarized", the name of the test isn't great. Perhaps "expanded" instead. But I don't think this justifies an update on its own. This can be cleaned up later.

@llvm-beanz llvm-beanz merged commit 5d2fa92 into microsoft:main Apr 9, 2025
12 checks passed
@llvm-beanz llvm-beanz deleted the cbieneman/longvec branch April 9, 2025 21:41
@github-project-automation github-project-automation bot moved this from New to Done in HLSL Roadmap Apr 9, 2025
@damyanp damyanp moved this from Needs Review to Closed in HLSL Support Apr 22, 2025
@damyanp damyanp removed this from HLSL Support Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

vectors longer than 254 are truncated to 1 when passed to intrinsics
4 participants