You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this patch adds support for new fussion in znver5 documented in the
optimization manual:
The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions
with certain ALU instructions. The following conditions need to be met for
fusion to happen:
- The MOV should be reg-reg mov with Opcode 0x89 or 0x8B
- The MOV is followed by an ALU instruction where the MOV and ALU destination register match.
- The ALU instruction may source only registers or immediate data. There cannot be any memory source.
- The ALU instruction sources either the source or dest of MOV instruction.
- If ALU instruction has 2 reg sources, they should be different.
- The following ALU instructions can fuse with an older qualified MOV instruction:
ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR
(I assume OP is OR)
I also increased issue rate from 4 to 6. Theoretically znver5 can do more, but
with our model we can't realy use it.
Increasing issue rate to 8 leads to infinite loop in scheduler.
Finally, I also enabled fuse_alu_and_branch since it is supported by
znver5 (I think by earlier zens too).
New fussion pattern moves quite few instructions around in common code:
@@ -2210,13 +2210,13 @@
.cfi_offset 3, -32
leaq 63(%rsi), %rbx
movq %rbx, %rbp
+ shrq $6, %rbp
+ salq $3, %rbp
subq $16, %rsp
.cfi_def_cfa_offset 48
movq %rdi, %r12
- shrq $6, %rbp
- movq %rsi, 8(%rsp)
- salq $3, %rbp
movq %rbp, %rdi
+ movq %rsi, 8(%rsp)
call _Znwm
movq 8(%rsp), %rsi
movl $0, 8(%r12)
@@ -2224,8 +2224,8 @@
movq %rax, (%r12)
movq %rbp, 32(%r12)
testq %rsi, %rsi
- movq %rsi, %rdx
cmovns %rsi, %rbx
+ movq %rsi, %rdx
sarq $63, %rdx
shrq $58, %rdx
sarq $6, %rbx
which should help decoder bandwidth and perhaps also cache, though I was not
able to measure off-noise effect on SPEC.
gcc/ChangeLog:
* config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5.
(ix86_adjust_cost): Add TODO about znver5 memory latency.
(ix86_fuse_mov_alu_p): New.
(ix86_macro_fusion_pair_p): Use it.
* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5.
(X86_TUNE_FUSE_MOV_AND_ALU): New tune;
0 commit comments