Skip to content

[AMDGPU] Implement vop3p complex pattern optmization for gisel #130234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Apr 18, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
556f7ff
Implement vop3p complex pattern optmization for gisel
Shoreshen Mar 7, 2025
58464a3
fix lit file
Shoreshen Mar 7, 2025
25f7db0
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 10, 2025
daae1ae
fix comments
Shoreshen Mar 10, 2025
afa6448
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 11, 2025
04a5d4c
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 12, 2025
2e587f5
fix comments
Shoreshen Mar 12, 2025
c6c4b3e
fix comments
Shoreshen Mar 12, 2025
a289297
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 13, 2025
dd106c7
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 17, 2025
6378180
fix comments and test case
Shoreshen Mar 17, 2025
b0feaff
fix comments
Shoreshen Mar 18, 2025
53370d8
fix conflict
Shoreshen Mar 18, 2025
3f178d2
fix lit
Shoreshen Mar 18, 2025
09abc3d
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 18, 2025
79b8992
fix comments
Shoreshen Mar 18, 2025
136da47
fix comments
Shoreshen Mar 18, 2025
61b4df7
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 19, 2025
97e6742
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 20, 2025
d79ac03
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 21, 2025
fc7c927
Block for root type other than 2 x Type
Shoreshen Mar 24, 2025
9f3a54f
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 24, 2025
bc51bf4
fix comments
Shoreshen Mar 24, 2025
cafa3d1
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 25, 2025
a5c5017
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 26, 2025
47840d7
fix comments
Shoreshen Mar 26, 2025
6fe4147
fix comments
Shoreshen Mar 26, 2025
3b7f377
fix comments
Shoreshen Mar 26, 2025
d7de92f
fix lit
Shoreshen Mar 26, 2025
45ed994
avoid global variable
Shoreshen Mar 26, 2025
2f83470
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 27, 2025
d651640
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 28, 2025
a792c1d
fix comments
Shoreshen Mar 28, 2025
9ac58f9
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 28, 2025
0d59649
Merge branch 'main' into gisel-vop3p
Shoreshen Mar 31, 2025
8390425
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 1, 2025
276e41b
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 2, 2025
1cb1651
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 2, 2025
c2eeedd
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 3, 2025
dc65247
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 7, 2025
6cd21e6
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 8, 2025
ee28947
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 9, 2025
797055d
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 11, 2025
e544665
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 11, 2025
0eac2e9
fix comments and case changes
Shoreshen Apr 11, 2025
e328d7a
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 14, 2025
c1680f3
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 15, 2025
d9dc316
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 16, 2025
223dc11
Merge branch 'main' into gisel-vop3p
Shoreshen Apr 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
379 changes: 351 additions & 28 deletions llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,8 @@ class AMDGPUInstructionSelector final : public InstructionSelector {

ComplexRendererFns selectVOP3NoMods(MachineOperand &Root) const;

std::pair<Register, unsigned>
selectVOP3PModsImpl(Register Src, const MachineRegisterInfo &MRI,
std::pair<const MachineOperand *, unsigned>
selectVOP3PModsImpl(const MachineOperand *Op, const MachineRegisterInfo &MRI,
bool IsDOT = false) const;

InstructionSelector::ComplexRendererFns
Expand Down
3 changes: 1 addition & 2 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,7 @@ define float @v_fdot2_neg_c(<2 x half> %a, <2 x half> %b, float %c) {
; GFX906-LABEL: v_fdot2_neg_c:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v2, 0x80000000, v2
; GFX906-NEXT: v_dot2_f32_f16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_f32_f16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we negate both halves? The IR is only doing fneg on a float, not on <2 x half>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rovka , to fix neg of float instead of <2 x half>:

  1. Separated NEG status for HI and LO.
  2. As indicated by this the neg of float is equivalent to neg of higher half of <2 x half>.
  3. Cases were added (e.g. v_fmul_v2f16_partial_neg) for neg float

However, this case does not took effect since all LLT that is not <2 x Scalar Type> will be blocked for safety (here is c as float).

; GFX906-NEXT: s_setpc_b64 s[30:31]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This isn't related to your patch, but it seems we're missing GFX10PLUS checks for a lot of these testcases. Could you please send a separate patch to fix that?

%neg.c = fneg float %c
%r = call float @llvm.amdgcn.fdot2(<2 x half> %a, <2 x half> %b, float %neg.c, i1 false)
Expand Down
24 changes: 8 additions & 16 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot2.ll
Original file line number Diff line number Diff line change
Expand Up @@ -248,8 +248,7 @@ define i32 @v_sdot2_fnegf32_c(<2 x i16> %a, <2 x i16> %b, float %c) {
; GFX906-LABEL: v_sdot2_fnegf32_c:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v2, 0x80000000, v2
; GFX906-NEXT: v_dot2_i32_i16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_i32_i16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX908-LABEL: v_sdot2_fnegf32_c:
Expand All @@ -263,8 +262,7 @@ define i32 @v_sdot2_fnegf32_c(<2 x i16> %a, <2 x i16> %b, float %c) {
; GFX10-LABEL: v_sdot2_fnegf32_c:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_xor_b32_e32 v2, 0x80000000, v2
; GFX10-NEXT: v_dot2_i32_i16 v0, v0, v1, v2
; GFX10-NEXT: v_dot2_i32_i16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't support the neg modifiers for integer operands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't. These changes are not right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rovka @shiltian , to block this:

  1. For generic op code, assume all have neg bits (same as before)
  2. For intrinsic, only valid for llvm.amdgcn.fdot2
  3. Block all types that is not <2 x Scalar Type>

; GFX10-NEXT: s_setpc_b64 s[30:31]
%neg.c = fneg float %c
%cast.neg.c = bitcast float %neg.c to i32
Expand All @@ -276,8 +274,7 @@ define i32 @v_sdot2_fnegv2f16_c(<2 x i16> %a, <2 x i16> %b, <2 x half> %c) {
; GFX906-LABEL: v_sdot2_fnegv2f16_c:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v2, 0x80008000, v2
; GFX906-NEXT: v_dot2_i32_i16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_i32_i16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX908-LABEL: v_sdot2_fnegv2f16_c:
Expand All @@ -291,8 +288,7 @@ define i32 @v_sdot2_fnegv2f16_c(<2 x i16> %a, <2 x i16> %b, <2 x half> %c) {
; GFX10-LABEL: v_sdot2_fnegv2f16_c:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_xor_b32_e32 v2, 0x80008000, v2
; GFX10-NEXT: v_dot2_i32_i16 v0, v0, v1, v2
; GFX10-NEXT: v_dot2_i32_i16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%neg.c = fneg <2 x half> %c
%cast.neg.c = bitcast <2 x half> %neg.c to i32
Expand All @@ -304,8 +300,7 @@ define i32 @v_sdot2_shuffle10_a(<2 x i16> %a, <2 x i16> %b, i32 %c) {
; GFX906-LABEL: v_sdot2_shuffle10_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_alignbit_b32 v0, v0, v0, 16
; GFX906-NEXT: v_dot2_i32_i16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_i32_i16 v0, v0, v1, v2 op_sel:[1,0,0] op_sel_hi:[0,1,1]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX908-LABEL: v_sdot2_shuffle10_a:
Expand All @@ -319,8 +314,7 @@ define i32 @v_sdot2_shuffle10_a(<2 x i16> %a, <2 x i16> %b, i32 %c) {
; GFX10-LABEL: v_sdot2_shuffle10_a:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_alignbit_b32 v0, v0, v0, 16
; GFX10-NEXT: v_dot2_i32_i16 v0, v0, v1, v2
; GFX10-NEXT: v_dot2_i32_i16 v0, v0, v1, v2 op_sel:[1,0,0] op_sel_hi:[0,1,1]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%shuf.a = shufflevector <2 x i16> %a, <2 x i16> undef, <2 x i32> <i32 1, i32 0>
%r = call i32 @llvm.amdgcn.sdot2(<2 x i16> %shuf.a, <2 x i16> %b, i32 %c, i1 false)
Expand All @@ -331,8 +325,7 @@ define i32 @v_sdot2_shuffle10_b(<2 x i16> %a, <2 x i16> %b, i32 %c) {
; GFX906-LABEL: v_sdot2_shuffle10_b:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_alignbit_b32 v1, v1, v1, 16
; GFX906-NEXT: v_dot2_i32_i16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_i32_i16 v0, v0, v1, v2 op_sel:[0,1,0] op_sel_hi:[1,0,1]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX908-LABEL: v_sdot2_shuffle10_b:
Expand All @@ -346,8 +339,7 @@ define i32 @v_sdot2_shuffle10_b(<2 x i16> %a, <2 x i16> %b, i32 %c) {
; GFX10-LABEL: v_sdot2_shuffle10_b:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_alignbit_b32 v1, v1, v1, 16
; GFX10-NEXT: v_dot2_i32_i16 v0, v0, v1, v2
; GFX10-NEXT: v_dot2_i32_i16 v0, v0, v1, v2 op_sel:[0,1,0] op_sel_hi:[1,0,1]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%shuf.b = shufflevector <2 x i16> %b, <2 x i16> undef, <2 x i32> <i32 1, i32 0>
%r = call i32 @llvm.amdgcn.sdot2(<2 x i16> %a, <2 x i16> %shuf.b, i32 %c, i1 false)
Expand Down
6 changes: 2 additions & 4 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,7 @@ define i32 @v_sdot4_fnegf32_a(float %a, i32 %b, i32 %c) {
; GFX906-LABEL: v_sdot4_fnegf32_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v0, 0x80000000, v0
; GFX906-NEXT: v_dot4_i32_i8 v0, v0, v1, v2
; GFX906-NEXT: v_dot4_i32_i8 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: v_sdot4_fnegf32_a:
Expand All @@ -112,8 +111,7 @@ define i32 @v_sdot4_fnegv2f16_a(<2 x half> %a, i32 %b, i32 %c) {
; GFX906-LABEL: v_sdot4_fnegv2f16_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v0, 0x80008000, v0
; GFX906-NEXT: v_dot4_i32_i8 v0, v0, v1, v2
; GFX906-NEXT: v_dot4_i32_i8 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: v_sdot4_fnegv2f16_a:
Expand Down
12 changes: 4 additions & 8 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot8.ll
Original file line number Diff line number Diff line change
Expand Up @@ -47,15 +47,13 @@ define i32 @v_sdot8_fnegf32_a(float %a, i32 %b, i32 %c) {
; GFX906-LABEL: v_sdot8_fnegf32_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v0, 0x80000000, v0
; GFX906-NEXT: v_dot8_i32_i4 v0, v0, v1, v2
; GFX906-NEXT: v_dot8_i32_i4 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: v_sdot8_fnegf32_a:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_xor_b32_e32 v0, 0x80000000, v0
; GFX10-NEXT: v_dot8_i32_i4 v0, v0, v1, v2
; GFX10-NEXT: v_dot8_i32_i4 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%neg.a = fneg float %a
%cast.neg.a = bitcast float %neg.a to i32
Expand All @@ -67,15 +65,13 @@ define i32 @v_sdot8_fnegv2f16_a(<2 x half> %a, i32 %b, i32 %c) {
; GFX906-LABEL: v_sdot8_fnegv2f16_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v0, 0x80008000, v0
; GFX906-NEXT: v_dot8_i32_i4 v0, v0, v1, v2
; GFX906-NEXT: v_dot8_i32_i4 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: v_sdot8_fnegv2f16_a:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_xor_b32_e32 v0, 0x80008000, v0
; GFX10-NEXT: v_dot8_i32_i4 v0, v0, v1, v2
; GFX10-NEXT: v_dot8_i32_i4 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%neg.a = fneg <2 x half> %a
%cast.neg.a = bitcast <2 x half> %neg.a to i32
Expand Down
36 changes: 12 additions & 24 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot2.ll
Original file line number Diff line number Diff line change
Expand Up @@ -235,22 +235,19 @@ define i32 @v_udot2_fnegf32_c(<2 x i16> %a, <2 x i16> %b, float %c) {
; GFX906-LABEL: v_udot2_fnegf32_c:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v2, 0x80000000, v2
; GFX906-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX908-LABEL: v_udot2_fnegf32_c:
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: v_xor_b32_e32 v2, 0x80000000, v2
; GFX908-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX908-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX908-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: v_udot2_fnegf32_c:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_xor_b32_e32 v2, 0x80000000, v2
; GFX10-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX10-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%neg.c = fneg float %c
%cast.neg.c = bitcast float %neg.c to i32
Expand All @@ -262,22 +259,19 @@ define i32 @v_udot2_fnegv2f16_c(<2 x i16> %a, <2 x i16> %b, <2 x half> %c) {
; GFX906-LABEL: v_udot2_fnegv2f16_c:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v2, 0x80008000, v2
; GFX906-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX908-LABEL: v_udot2_fnegv2f16_c:
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: v_xor_b32_e32 v2, 0x80008000, v2
; GFX908-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX908-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX908-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: v_udot2_fnegv2f16_c:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_xor_b32_e32 v2, 0x80008000, v2
; GFX10-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX10-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 neg_lo:[0,0,1] neg_hi:[0,0,1]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%neg.c = fneg <2 x half> %c
%cast.neg.c = bitcast <2 x half> %neg.c to i32
Expand All @@ -289,22 +283,19 @@ define i32 @v_udot2_shuffle10_a(<2 x i16> %a, <2 x i16> %b, i32 %c) {
; GFX906-LABEL: v_udot2_shuffle10_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_alignbit_b32 v0, v0, v0, 16
; GFX906-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 op_sel:[1,0,0] op_sel_hi:[0,1,1]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX908-LABEL: v_udot2_shuffle10_a:
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: v_alignbit_b32 v0, v0, v0, 16
; GFX908-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX908-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 op_sel:[1,0,0] op_sel_hi:[0,1,1]
; GFX908-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: v_udot2_shuffle10_a:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_alignbit_b32 v0, v0, v0, 16
; GFX10-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX10-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 op_sel:[1,0,0] op_sel_hi:[0,1,1]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%shuf.a = shufflevector <2 x i16> %a, <2 x i16> undef, <2 x i32> <i32 1, i32 0>
%r = call i32 @llvm.amdgcn.udot2(<2 x i16> %shuf.a, <2 x i16> %b, i32 %c, i1 false)
Expand All @@ -315,22 +306,19 @@ define i32 @v_udot2_shuffle10_b(<2 x i16> %a, <2 x i16> %b, i32 %c) {
; GFX906-LABEL: v_udot2_shuffle10_b:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_alignbit_b32 v1, v1, v1, 16
; GFX906-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX906-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 op_sel:[0,1,0] op_sel_hi:[1,0,1]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX908-LABEL: v_udot2_shuffle10_b:
; GFX908: ; %bb.0:
; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX908-NEXT: v_alignbit_b32 v1, v1, v1, 16
; GFX908-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX908-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 op_sel:[0,1,0] op_sel_hi:[1,0,1]
; GFX908-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: v_udot2_shuffle10_b:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_alignbit_b32 v1, v1, v1, 16
; GFX10-NEXT: v_dot2_u32_u16 v0, v0, v1, v2
; GFX10-NEXT: v_dot2_u32_u16 v0, v0, v1, v2 op_sel:[0,1,0] op_sel_hi:[1,0,1]
; GFX10-NEXT: s_setpc_b64 s[30:31]
%shuf.b = shufflevector <2 x i16> %b, <2 x i16> undef, <2 x i32> <i32 1, i32 0>
%r = call i32 @llvm.amdgcn.udot2(<2 x i16> %a, <2 x i16> %shuf.b, i32 %c, i1 false)
Expand Down
12 changes: 4 additions & 8 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll
Original file line number Diff line number Diff line change
Expand Up @@ -112,15 +112,13 @@ define i32 @v_udot4_fnegf32_a(float %a, i32 %b, i32 %c) {
; GFX906-LABEL: v_udot4_fnegf32_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v0, 0x80000000, v0
; GFX906-NEXT: v_dot4_u32_u8 v0, v0, v1, v2
; GFX906-NEXT: v_dot4_u32_u8 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX10PLUS-LABEL: v_udot4_fnegf32_a:
; GFX10PLUS: ; %bb.0:
; GFX10PLUS-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10PLUS-NEXT: v_xor_b32_e32 v0, 0x80000000, v0
; GFX10PLUS-NEXT: v_dot4_u32_u8 v0, v0, v1, v2
; GFX10PLUS-NEXT: v_dot4_u32_u8 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX10PLUS-NEXT: s_setpc_b64 s[30:31]
%neg.a = fneg float %a
%cast.neg.a = bitcast float %neg.a to i32
Expand All @@ -132,15 +130,13 @@ define i32 @v_udot4_fnegv2f16_a(<2 x half> %a, i32 %b, i32 %c) {
; GFX906-LABEL: v_udot4_fnegv2f16_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v0, 0x80008000, v0
; GFX906-NEXT: v_dot4_u32_u8 v0, v0, v1, v2
; GFX906-NEXT: v_dot4_u32_u8 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX10PLUS-LABEL: v_udot4_fnegv2f16_a:
; GFX10PLUS: ; %bb.0:
; GFX10PLUS-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10PLUS-NEXT: v_xor_b32_e32 v0, 0x80008000, v0
; GFX10PLUS-NEXT: v_dot4_u32_u8 v0, v0, v1, v2
; GFX10PLUS-NEXT: v_dot4_u32_u8 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX10PLUS-NEXT: s_setpc_b64 s[30:31]
%neg.a = fneg <2 x half> %a
%cast.neg.a = bitcast <2 x half> %neg.a to i32
Expand Down
12 changes: 4 additions & 8 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot8.ll
Original file line number Diff line number Diff line change
Expand Up @@ -48,15 +48,13 @@ define i32 @v_udot8_fnegf32_a(float %a, i32 %b, i32 %c) {
; GFX906-LABEL: v_udot8_fnegf32_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v0, 0x80000000, v0
; GFX906-NEXT: v_dot8_u32_u4 v0, v0, v1, v2
; GFX906-NEXT: v_dot8_u32_u4 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX10PLUS-LABEL: v_udot8_fnegf32_a:
; GFX10PLUS: ; %bb.0:
; GFX10PLUS-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10PLUS-NEXT: v_xor_b32_e32 v0, 0x80000000, v0
; GFX10PLUS-NEXT: v_dot8_u32_u4 v0, v0, v1, v2
; GFX10PLUS-NEXT: v_dot8_u32_u4 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX10PLUS-NEXT: s_setpc_b64 s[30:31]
%neg.a = fneg float %a
%cast.neg.a = bitcast float %neg.a to i32
Expand All @@ -68,15 +66,13 @@ define i32 @v_udot8_fnegv2f16_a(<2 x half> %a, i32 %b, i32 %c) {
; GFX906-LABEL: v_udot8_fnegv2f16_a:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_xor_b32_e32 v0, 0x80008000, v0
; GFX906-NEXT: v_dot8_u32_u4 v0, v0, v1, v2
; GFX906-NEXT: v_dot8_u32_u4 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
; GFX10PLUS-LABEL: v_udot8_fnegv2f16_a:
; GFX10PLUS: ; %bb.0:
; GFX10PLUS-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10PLUS-NEXT: v_xor_b32_e32 v0, 0x80008000, v0
; GFX10PLUS-NEXT: v_dot8_u32_u4 v0, v0, v1, v2
; GFX10PLUS-NEXT: v_dot8_u32_u4 v0, v0, v1, v2 neg_lo:[1,0,0] neg_hi:[1,0,0]
; GFX10PLUS-NEXT: s_setpc_b64 s[30:31]
%neg.a = fneg <2 x half> %a
%cast.neg.a = bitcast <2 x half> %neg.a to i32
Expand Down
10 changes: 5 additions & 5 deletions llvm/test/CodeGen/AMDGPU/packed-fp32.ll
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ define amdgpu_kernel void @fadd_v2_v_v_splat(ptr addrspace(1) %a) {
; GCN-LABEL: {{^}}fadd_v2_v_lit_splat:
; GFX900-COUNT-2: v_add_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
; PACKED-SDAG: v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 1.0 op_sel_hi:[1,0]{{$}}
; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 1.0{{$}}
; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 1.0 op_sel_hi:[1,0]{{$}}
define amdgpu_kernel void @fadd_v2_v_lit_splat(ptr addrspace(1) %a) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds <2 x float>, ptr addrspace(1) %a, i32 %id
Expand Down Expand Up @@ -308,7 +308,7 @@ define amdgpu_kernel void @fmul_v2_v_v_splat(ptr addrspace(1) %a) {
; GCN-LABEL: {{^}}fmul_v2_v_lit_splat:
; GFX900-COUNT-2: v_mul_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}
; PACKED-SDAG: v_pk_mul_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0 op_sel_hi:[1,0]{{$}}
; PACKED-GISEL: v_pk_mul_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0{{$}}
; PACKED-GISEL: v_pk_mul_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0 op_sel_hi:[1,0]{{$}}
define amdgpu_kernel void @fmul_v2_v_lit_splat(ptr addrspace(1) %a) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds <2 x float>, ptr addrspace(1) %a, i32 %id
Expand Down Expand Up @@ -432,7 +432,7 @@ define amdgpu_kernel void @fma_v2_v_v_splat(ptr addrspace(1) %a) {
; GCN-LABEL: {{^}}fma_v2_v_lit_splat:
; GFX900-COUNT-2: v_fma_f32 v{{[0-9]+}}, v{{[0-9]+}}, 4.0, 1.0
; PACKED-SDAG: v_pk_fma_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0, 1.0 op_sel_hi:[1,0,0]{{$}}
; PACKED-GISEL: v_pk_fma_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0, 1.0{{$}}
; PACKED-GISEL: v_pk_fma_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0, 1.0 op_sel_hi:[1,0,0]{{$}}
define amdgpu_kernel void @fma_v2_v_lit_splat(ptr addrspace(1) %a) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds <2 x float>, ptr addrspace(1) %a, i32 %id
Expand Down Expand Up @@ -556,8 +556,8 @@ bb:
; PACKED-SDAG: v_add_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, 0
; PACKED-SDAG: v_add_f32_e32 v{{[0-9]+}}, 0, v{{[0-9]+}}

; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], s[{{[0-9:]+}}], 0{{$}}
; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 0{{$}}
; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], s[{{[0-9:]+}}], 0 op_sel_hi:[1,0]{{$}}
; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 0 op_sel_hi:[1,0]{{$}}
define amdgpu_kernel void @fadd_fadd_fsub_0(<2 x float> %arg) {
bb:
%i12 = fadd <2 x float> zeroinitializer, %arg
Expand Down
Loading