Skip to content

Commit 4607799

Browse files
committed
arm: Use utxb rN, rM, ror #8 to implement zero_extract on armv6.
Examining the code generated for the following C snippet on a raspberry pi: int popcount_lut8(unsigned *buf, int n) { int cnt=0; unsigned int i; do { i = *buf; cnt += lut[i&255]; cnt += lut[i>>8&255]; cnt += lut[i>>16&255]; cnt += lut[i>>24]; buf++; } while(--n); return cnt; } I was surprised to see following instruction sequence generated by the compiler: mov r5, r2, lsr #8 uxtb r5, r5 This sequence can be performed by a single ARM instruction: uxtb r5, r2, ror #8 The attached patch allows GCC's combine pass to take advantage of ARM's uxtb with rotate functionality to implement the above zero_extract, and likewise to use the sxtb with rotate to implement sign_extract. ARM's uxtb and sxtb can only be used with rotates of 0, 8, 16 and 24, and of these only the 8 and 16 are useful [ror #0 is a nop, and extends with ror #24 can be implemented using regular shifts], so the approach here is to add the six missing but useful instructions as 6 different define_insn in arm.md, rather than try to be clever with new predicates. Later ARM hardware has advanced bit field instructions, and earlier ARM cores didn't support extend-with-rotate, so this appears to only benefit armv6 era CPUs (e.g. the raspberry pi). Patch posted: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01339.html Approved by Kyrill Tkachov: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01881.html 2024-05-12 Roger Sayle <roger@nextmovesoftware.com> Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> * config/arm/arm.md (*arm_zeroextractsi2_8_8, *arm_signextractsi2_8_8, *arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16, *arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New. 2024-05-12 Roger Sayle <roger@nextmovesoftware.com> Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> * gcc.target/arm/extend-ror.c: New test.
1 parent 83fb5e6 commit 4607799

File tree

2 files changed

+104
-0
lines changed

2 files changed

+104
-0
lines changed

gcc/config/arm/arm.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12647,6 +12647,72 @@
1264712647
""
1264812648
)
1264912649

12650+
;; Implement zero_extract using uxtb/uxth instruction with
12651+
;; the ror #N qualifier when applicable.
12652+
12653+
(define_insn "*arm_zeroextractsi2_8_8"
12654+
[(set (match_operand:SI 0 "s_register_operand" "=r")
12655+
(zero_extract:SI (match_operand:SI 1 "s_register_operand" "r")
12656+
(const_int 8) (const_int 8)))]
12657+
"TARGET_ARM && arm_arch6"
12658+
"uxtb%?\\t%0, %1, ror #8"
12659+
[(set_attr "predicable" "yes")
12660+
(set_attr "type" "extend")]
12661+
)
12662+
12663+
(define_insn "*arm_zeroextractsi2_8_16"
12664+
[(set (match_operand:SI 0 "s_register_operand" "=r")
12665+
(zero_extract:SI (match_operand:SI 1 "s_register_operand" "r")
12666+
(const_int 8) (const_int 16)))]
12667+
"TARGET_ARM && arm_arch6"
12668+
"uxtb%?\\t%0, %1, ror #16"
12669+
[(set_attr "predicable" "yes")
12670+
(set_attr "type" "extend")]
12671+
)
12672+
12673+
(define_insn "*arm_zeroextractsi2_16_8"
12674+
[(set (match_operand:SI 0 "s_register_operand" "=r")
12675+
(zero_extract:SI (match_operand:SI 1 "s_register_operand" "r")
12676+
(const_int 16) (const_int 8)))]
12677+
"TARGET_ARM && arm_arch6"
12678+
"uxth%?\\t%0, %1, ror #8"
12679+
[(set_attr "predicable" "yes")
12680+
(set_attr "type" "extend")]
12681+
)
12682+
12683+
;; Implement sign_extract using sxtb/sxth instruction with
12684+
;; the ror #N qualifier when applicable.
12685+
12686+
(define_insn "*arm_signextractsi2_8_8"
12687+
[(set (match_operand:SI 0 "s_register_operand" "=r")
12688+
(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
12689+
(const_int 8) (const_int 8)))]
12690+
"TARGET_ARM && arm_arch6"
12691+
"sxtb%?\\t%0, %1, ror #8"
12692+
[(set_attr "predicable" "yes")
12693+
(set_attr "type" "extend")]
12694+
)
12695+
12696+
(define_insn "*arm_signextractsi2_8_16"
12697+
[(set (match_operand:SI 0 "s_register_operand" "=r")
12698+
(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
12699+
(const_int 8) (const_int 16)))]
12700+
"TARGET_ARM && arm_arch6"
12701+
"sxtb%?\\t%0, %1, ror #16"
12702+
[(set_attr "predicable" "yes")
12703+
(set_attr "type" "extend")]
12704+
)
12705+
12706+
(define_insn "*arm_signextractsi2_16_8"
12707+
[(set (match_operand:SI 0 "s_register_operand" "=r")
12708+
(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
12709+
(const_int 16) (const_int 8)))]
12710+
"TARGET_ARM && arm_arch6"
12711+
"sxth%?\\t%0, %1, ror #8"
12712+
[(set_attr "predicable" "yes")
12713+
(set_attr "type" "extend")]
12714+
)
12715+
1265012716
;; Patterns for LDRD/STRD in Thumb2 mode
1265112717

1265212718
(define_insn "*thumb2_ldrd"
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
/* { dg-do compile } */
2+
/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=armv6" } } */
3+
/* { dg-require-effective-target arm_arm_ok } */
4+
/* { dg-add-options arm_arch_v6 } */
5+
/* { dg-additional-options "-O -marm" } */
6+
7+
unsigned int zeroextractsi2_8_8(unsigned int x)
8+
{
9+
return (unsigned char)(x>>8);
10+
}
11+
12+
unsigned int zeroextractsi2_8_16(unsigned int x)
13+
{
14+
return (unsigned char)(x>>16);
15+
}
16+
17+
unsigned int signextractsi2_8_8(unsigned int x)
18+
{
19+
return (int)(signed char)(x>>8);
20+
}
21+
22+
unsigned int signextractsi2_8_16(unsigned int x)
23+
{
24+
return (int)(signed char)(x>>16);
25+
}
26+
27+
unsigned int zeroextractsi2_16_8(unsigned int x)
28+
{
29+
return (unsigned short)(x>>8);
30+
}
31+
32+
unsigned int signextractsi2_16_8(unsigned int x)
33+
{
34+
return (int)(short)(x>>8);
35+
}
36+
37+
/* { dg-final { scan-assembler-times ", ror #8" 4 } } */
38+
/* { dg-final { scan-assembler-times ", ror #16" 2 } } */

0 commit comments

Comments
 (0)