Commit f23dc72
committed
AArch64: Update div-bitmask to implement new optab instead of target hook [PR108583]
This replaces the custom division hook with just an implementation through
add_highpart. For NEON we implement the add highpart (Addition + extraction of
the upper highpart of the register in the same precision) as ADD + LSR.
This representation allows us to easily optimize the sequence using existing
sequences. This gets us a pretty decent sequence using SRA:
umull v1.8h, v0.8b, v3.8b
umull2 v0.8h, v0.16b, v3.16b
add v5.8h, v1.8h, v2.8h
add v4.8h, v0.8h, v2.8h
usra v1.8h, v5.8h, 8
usra v0.8h, v4.8h, 8
uzp2 v1.16b, v1.16b, v0.16b
To get the most optimal sequence however we match (a + ((b + c) >> n)) where n
is half the precision of the mode of the operation into addhn + uaddw which is
a general good optimization on its own and gets us back to:
.L4:
ldr q0, [x3]
umull v1.8h, v0.8b, v5.8b
umull2 v0.8h, v0.16b, v5.16b
addhn v3.8b, v1.8h, v4.8h
addhn v2.8b, v0.8h, v4.8h
uaddw v1.8h, v1.8h, v3.8b
uaddw v0.8h, v0.8h, v2.8b
uzp2 v1.16b, v1.16b, v0.16b
str q1, [x3], 16
cmp x3, x4
bne .L4
For SVE2 we optimize the initial sequence to the same ADD + LSR which gets us:
.L3:
ld1b z0.h, p0/z, [x0, x3]
mul z0.h, p1/m, z0.h, z2.h
add z1.h, z0.h, z3.h
usra z0.h, z1.h, gcc-mirror#8
lsr z0.h, z0.h, gcc-mirror#8
st1b z0.h, p0, [x0, x3]
inch x3
whilelo p0.h, w3, w2
b.any .L3
.L1:
ret
and to get the most optimal sequence I match (a + b) >> n (same constraint on n)
to addhnb which gets us to:
.L3:
ld1b z0.h, p0/z, [x0, x3]
mul z0.h, p1/m, z0.h, z2.h
addhnb z1.b, z0.h, z3.h
addhnb z0.b, z0.h, z1.h
st1b z0.h, p0, [x0, x3]
inch x3
whilelo p0.h, w3, w2
b.any .L3
There are multiple RTL representations possible for these optimizations, I did
not represent them using a zero_extend because we seem very inconsistent in this
in the backend. Since they are unspecs we won't match them from vector ops
anyway. I figured maintainers would prefer this, but my maintainer ouija board
is still out for repairs :)
There are no new test as new correctness tests were added to the mid-end and
the existing codegen tests for this already exist.
gcc/ChangeLog:
PR target/108583
* config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv<mode>3): Remove.
(*bitmask_shift_plus<mode>): New.
* config/aarch64/aarch64-sve2.md (*bitmask_shift_plus<mode>): New.
(@aarch64_bitmask_udiv<mode>3): Remove.
* config/aarch64/aarch64.cc
(aarch64_vectorize_can_special_div_by_constant,
TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed.
(TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT,
aarch64_vectorize_preferred_div_as_shifts_over_mult): New.1 parent 81fd62d commit f23dc72
File tree
3 files changed
+52
-137
lines changed- gcc/config/aarch64
3 files changed
+52
-137
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4867 | 4867 | | |
4868 | 4868 | | |
4869 | 4869 | | |
4870 | | - | |
4871 | | - | |
4872 | | - | |
4873 | | - | |
4874 | | - | |
4875 | | - | |
4876 | | - | |
4877 | | - | |
4878 | | - | |
4879 | | - | |
4880 | | - | |
4881 | | - | |
4882 | | - | |
4883 | | - | |
4884 | | - | |
4885 | | - | |
4886 | | - | |
4887 | | - | |
4888 | | - | |
4889 | | - | |
4890 | | - | |
4891 | | - | |
4892 | | - | |
4893 | | - | |
4894 | | - | |
4895 | | - | |
4896 | | - | |
4897 | | - | |
4898 | | - | |
4899 | | - | |
4900 | | - | |
4901 | | - | |
4902 | | - | |
4903 | | - | |
4904 | | - | |
4905 | | - | |
| 4870 | + | |
| 4871 | + | |
| 4872 | + | |
| 4873 | + | |
| 4874 | + | |
| 4875 | + | |
| 4876 | + | |
| 4877 | + | |
| 4878 | + | |
4906 | 4879 | | |
| 4880 | + | |
| 4881 | + | |
| 4882 | + | |
4907 | 4883 | | |
4908 | | - | |
4909 | | - | |
4910 | | - | |
4911 | | - | |
4912 | | - | |
4913 | | - | |
4914 | | - | |
4915 | | - | |
4916 | | - | |
4917 | | - | |
4918 | | - | |
4919 | | - | |
4920 | | - | |
4921 | | - | |
4922 | | - | |
4923 | | - | |
| 4884 | + | |
| 4885 | + | |
| 4886 | + | |
| 4887 | + | |
| 4888 | + | |
| 4889 | + | |
| 4890 | + | |
4924 | 4891 | | |
4925 | 4892 | | |
4926 | 4893 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
75 | 74 | | |
76 | 75 | | |
77 | 76 | | |
| |||
1600 | 1599 | | |
1601 | 1600 | | |
1602 | 1601 | | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| 1609 | + | |
| 1610 | + | |
| 1611 | + | |
| 1612 | + | |
| 1613 | + | |
| 1614 | + | |
| 1615 | + | |
| 1616 | + | |
| 1617 | + | |
1603 | 1618 | | |
1604 | 1619 | | |
1605 | 1620 | | |
| |||
2313 | 2328 | | |
2314 | 2329 | | |
2315 | 2330 | | |
2316 | | - | |
2317 | | - | |
2318 | | - | |
2319 | | - | |
2320 | | - | |
2321 | | - | |
2322 | | - | |
2323 | | - | |
2324 | | - | |
2325 | | - | |
2326 | | - | |
2327 | | - | |
2328 | | - | |
2329 | | - | |
2330 | | - | |
2331 | | - | |
2332 | | - | |
2333 | | - | |
2334 | | - | |
2335 | | - | |
2336 | | - | |
2337 | | - | |
2338 | | - | |
2339 | | - | |
2340 | | - | |
2341 | | - | |
2342 | | - | |
2343 | | - | |
2344 | | - | |
2345 | | - | |
2346 | | - | |
2347 | | - | |
2348 | | - | |
2349 | | - | |
2350 | | - | |
2351 | | - | |
2352 | | - | |
2353 | | - | |
2354 | | - | |
2355 | | - | |
2356 | 2331 | | |
2357 | 2332 | | |
2358 | 2333 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3847 | 3847 | | |
3848 | 3848 | | |
3849 | 3849 | | |
| 3850 | + | |
| 3851 | + | |
| 3852 | + | |
| 3853 | + | |
| 3854 | + | |
| 3855 | + | |
| 3856 | + | |
| 3857 | + | |
| 3858 | + | |
| 3859 | + | |
| 3860 | + | |
| 3861 | + | |
| 3862 | + | |
3850 | 3863 | | |
3851 | 3864 | | |
3852 | 3865 | | |
| |||
24361 | 24374 | | |
24362 | 24375 | | |
24363 | 24376 | | |
24364 | | - | |
24365 | | - | |
24366 | | - | |
24367 | | - | |
24368 | | - | |
24369 | | - | |
24370 | | - | |
24371 | | - | |
24372 | | - | |
24373 | | - | |
24374 | | - | |
24375 | | - | |
24376 | | - | |
24377 | | - | |
24378 | | - | |
24379 | | - | |
24380 | | - | |
24381 | | - | |
24382 | | - | |
24383 | | - | |
24384 | | - | |
24385 | | - | |
24386 | | - | |
24387 | | - | |
24388 | | - | |
24389 | | - | |
24390 | | - | |
24391 | | - | |
24392 | | - | |
24393 | | - | |
24394 | | - | |
24395 | | - | |
24396 | | - | |
24397 | | - | |
24398 | | - | |
24399 | | - | |
24400 | | - | |
24401 | | - | |
24402 | | - | |
24403 | | - | |
24404 | 24377 | | |
24405 | 24378 | | |
24406 | 24379 | | |
| |||
27902 | 27875 | | |
27903 | 27876 | | |
27904 | 27877 | | |
| 27878 | + | |
| 27879 | + | |
| 27880 | + | |
| 27881 | + | |
27905 | 27882 | | |
27906 | 27883 | | |
27907 | 27884 | | |
27908 | | - | |
27909 | | - | |
27910 | | - | |
27911 | | - | |
27912 | 27885 | | |
27913 | 27886 | | |
27914 | 27887 | | |
| |||
0 commit comments