You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A urem by a constant, much like a udiv by a constant, can be expanded into a
series of mul/add/shift instructions. The exact sequence of instructions
depends on the constants and the types.
If the constant is a power-2 then a shift / and will be used, so the cost will
be 1. This canonicalization happens relatively early so this likely has very
little effect in practice (it does help the cost of funnel shifts).
For a non-power 2 the code for div will expand to a series of UMULH + Add +
Shift + Add, depending on the constant. urem is generally udiv + mul + sub, so
involves a few extra instructions. The UMULH is not always available, i32 will
use umull+shift, and vector types will use umull+shift or umull+umull2+uzp
depending on the vector size. v2i64 will be scalarized because there is no mul
available. SVE does have a UMULH instruction.
The end result is that the costs should be closer to reality, with scalable
types a little lower cost than the fixed-width versions. (In the future we
might be able to use umulh for fixed-width when the SVE instruction is
available, but for the moment this should favour scalable vectorization a
little).
I've tried to make this patch only apply to constant UREM/UDIV instructions.
SDIV and SREM are left until a later patch to prevent this becoming too
complex. The funnel shift costs are changing as it believes it will need a urem
to clamp the shift amount, which should be a power-2 value for most common
types.
Copy file name to clipboardExpand all lines: llvm/test/Analysis/CostModel/AArch64/fshl.ll
+1-1
Original file line number
Diff line number
Diff line change
@@ -224,7 +224,7 @@ declare <2 x i64> @llvm.fshl.v4i64(<2 x i64>, <2 x i64>, <2 x i64>)
224
224
225
225
define <4 x i30> @fshl_v4i30_3rd_arg_var(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c) {
226
226
; CHECK-LABEL: 'fshl_v4i30_3rd_arg_var'
227
-
; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fshl = tail call <4 x i30> @llvm.fshl.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
227
+
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fshl = tail call <4 x i30> @llvm.fshl.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
228
228
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i30> %fshl
Copy file name to clipboardExpand all lines: llvm/test/Analysis/CostModel/AArch64/fshr.ll
+1-1
Original file line number
Diff line number
Diff line change
@@ -224,7 +224,7 @@ declare <2 x i64> @llvm.fshr.v4i64(<2 x i64>, <2 x i64>, <2 x i64>)
224
224
225
225
define <4 x i30> @fshr_v4i30_3rd_arg_var(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c) {
226
226
; CHECK-LABEL: 'fshr_v4i30_3rd_arg_var'
227
-
; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fshr = tail call <4 x i30> @llvm.fshr.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
227
+
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fshr = tail call <4 x i30> @llvm.fshr.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
228
228
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i30> %fshr
0 commit comments