-
Notifications
You must be signed in to change notification settings - Fork 803
Implementation of the CoopVec Inference and Training builtin intrinisics #7290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the CoopVec Inference and Training builtin intrinisics #7290
Conversation
|
✅ With the latest revision this PR passed the Python code formatter. |
This applies to all the builtins I've tried so far, but the VectorAccumulate example is quite minimal. Given this code: export void TruncatedVector(vector<half, 254> Input254, vector<half, 255> Input255) {
__builtin_VectorAccumulate(Input254, RWBuf, 0);
__builtin_VectorAccumulate(Input255, RWBuf, 0);
}```
This generates:
```llvm
; Function Attrs: nounwind
define void @"\01?TruncatedVector@@YAXV?$vector@$halff@$0PO@@@V?$vector@$halff@$0PP@@@@Z"(<254 x float> %Input254, <255 x float> %Input255) #0 {
%1 = load %dx.types.Handle, %dx.types.Handle* @"\01?RWBuf@@3URWByteAddressBuffer@@A", align 4
%2 = call %dx.types.Handle @dx.op.createHandleForLib.dx.types.Handle(i32 160, %dx.types.Handle %1) ; CreateHandleForLib(Resource)
%3 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %2, %dx.types.ResourceProperties { i32 4107, i32 0 }) ; AnnotateHandle(res,props) resource: RWByteAddressBuffer
call void @dx.op.vectorAccumulate.v254f32(i32 308, <254 x float> %Input254, %dx.types.Handle %3, i32 0) ; VectorAccumulate(inputVector,arrayBuffer,arrayOffset)
%4 = shufflevector <255 x float> %Input255, <255 x float> undef, <1 x i32> zeroinitializer
%5 = call %dx.types.Handle @dx.op.createHandleForLib.dx.types.Handle(i32 160, %dx.types.Handle %1) ; CreateHandleForLib(Resource)
%6 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %5, %dx.types.ResourceProperties { i32 4107, i32 0 }) ; AnnotateHandle(res,props) resource: RWByteAddressBuffer
call void @dx.op.vectorAccumulate.v1f32(i32 308, <1 x float> %4, %dx.types.Handle %6, i32 0) ; VectorAccumulate(inputVector,arrayBuffer,arrayOffset)
ret void
}Note how Input255 is explicitly truncated to 1xfloat before the vectorAccumulate is called. Input254 is not truncated. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the generated content, I think this looks good. Just one small nit regarding DXIL Op descriptions.
I believe the generated content is out of date/incorrect, since I noticed some deleted operations and a missing .json file update. In any case, generated files will need to be updated before the final PR is ready for merging.
…ller type. The declared input type must be 32-bit unsigned integer.
non-overload test)
…validation errors per review feedback, some cleanup
…taccumulate and vector accumulate functions
|
Just an FYI: |
…ics (microsoft#7290) Implements HLSL: __builtin_MatVecMul __builtin_MatVecMulAdd __builtin_OuterProductAccumulate __builtin_VectorAccumulate Lowered to DXIL: @dx.op.matVecMul @dx.op.matVecMulAdd @dx.op.outerProductAccumulate @dx.op.vectorAccumulate --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Damyan Pepper <damyanp@microsoft.com> Co-authored-by: Simon Moll <smoll@nvidia.com> Co-authored-by: Tex Riddell <texr@microsoft.com> Co-authored-by: Chris B <beanz@abolishcrlf.org> (cherry picked from commit 1db8c5b)
This PR introduces the linear algebra header file, and places it in a location that is by default included in all HLSL compilation. The builtins in the API aren't yet defined, and depend on the #7290 PR merging first. The tests that have been added have temporary diagnostic messages while 7290 is in progress. They will need to be updated. Open to feedback on better / suggested error messages, or whether there shouldn't be any sema-level validation for these errors. Fixes [#7304](#7304) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…ics (#7290) (#7381) Authored-by: Anupama Chandrasekhar <anupamac@nvidia.com> Implements HLSL: __builtin_MatVecMul __builtin_MatVecMulAdd __builtin_OuterProductAccumulate __builtin_VectorAccumulate Lowered to DXIL: @dx.op.matVecMul @dx.op.matVecMulAdd @dx.op.outerProductAccumulate @dx.op.vectorAccumulate --------- Co-authored-by: Anupama Chandrasekhar <anupamac@nvidia.com> Co-authored-by: Simon Moll <smoll@nvidia.com> (cherry picked from commit 1db8c5b)
This PR introduces the linear algebra header file, and places it in a location that is by default included in all HLSL compilation. The builtins in the API aren't yet defined, and depend on the microsoft#7290 PR merging first. The tests that have been added have temporary diagnostic messages while 7290 is in progress. They will need to be updated. Open to feedback on better / suggested error messages, or whether there shouldn't be any sema-level validation for these errors. Fixes [microsoft#7304](microsoft#7304) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This PR introduces the linear algebra header file, and places it in a location that is by default included in all HLSL compilation. The builtins in the API aren't yet defined, and depend on the #7290 PR merging first. The tests that have been added have temporary diagnostic messages while 7290 is in progress. They will need to be updated. Open to feedback on better / suggested error messages, or whether there shouldn't be any sema-level validation for these errors. Fixes [#7304](#7304) Cherrypick of #7350 Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Implements
HLSL:
__builtin_MatVecMul
__builtin_MatVecMulAdd
__builtin_OuterProductAccumulate
__builtin_VectorAccumulate
Lowered to
DXIL:
@dx.op.matVecMul
@dx.op.matVecMulAdd
@dx.op.outerProductAccumulate
@dx.op.vectorAccumulate