Suboptimal codegen for llvm.vector.reduce of <N x i1>

|  |  |
| --- | --- |
| Bugzilla Link | [51122](https://llvm.org/bz51122) |
| Version | 12.0 |
| OS | All |
| CC | @Arnaud-de-Grandmaison-ARM,@DMG862,@RKSimon,@smithp35 |

## Extended Description 
The binary reduction intrinsics on Aarch64 (and ARM) produce suboptimal implementations over vectors of i1.  This issue is similar to llvm/llvm-project#38188 .
```
declare i1 @llvm.vector.reduce.or.v16i1(<16 x i1> %a);

define i1 @mask_reduce_or(<16 x i8> %mask) {
    %mask1 = trunc <16 x i8> %mask to <8 x i1>
    %reduced = call i1 @llvm.vector.reduce.or.v16i1(<8 x i1> %mask1)
    ret i1 %reduced
}
```
produces
```
mask_reduce_or:                         // @mask_reduce_or
        umov    w14, v0.b[1]
        umov    w15, v0.b[0]
        umov    w13, v0.b[2]
        orr     w14, w15, w14
        umov    w12, v0.b[3]
        orr     w13, w14, w13
        umov    w11, v0.b[4]
        orr     w12, w13, w12
        umov    w10, v0.b[5]
        orr     w11, w12, w11
        umov    w9, v0.b[6]
        orr     w10, w11, w10
        umov    w8, v0.b[7]
        orr     w9, w10, w9
        orr     w8, w9, w8
        and     w0, w8, #0x1
        ret
```
when it could instead use vmaxvq (or vpmax on ARM).

The same goes for vector.reduce.and with vminvq (or vpmin on ARM).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

Extended Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


Bugzilla Link	51122
Version	12.0
OS	All
CC	@Arnaud-de-Grandmaison-ARM,@DMG862,@RKSimon,@smithp35

Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

Description

Extended Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions