Skip to content

Saturating truncation produces extra instructions #68466

Closed
@calebzulawski

Description

@calebzulawski

See: https://llvm.godbolt.org/z/4KdejfEsG

The following two functions:

declare <4 x i16> @llvm.smax.v4i16(<4 x i16>, <4 x i16>)
declare <4 x i16> @llvm.smin.v4i16(<4 x i16>, <4 x i16>)
declare <8 x i16> @llvm.smax.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.smin.v8i16(<8 x i16>, <8 x i16>)

define <4 x i8> @saturate4(<4 x i16> %x) {
  %1 = tail call <4 x i16> @llvm.smax.v4i16(<4 x i16> %x, <4 x i16> zeroinitializer)
  %2 = tail call <4 x i16> @llvm.smin.v4i16(<4 x i16> %1, <4 x i16> <i16 255, i16 255, i16 255, i16 255>)
  %3 = trunc <4 x i16> %2 to <4 x i8>
  ret <4 x i8> %3
}

define <8 x i8> @saturate8(<8 x i16> %x) {
  %1 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %x, <8 x i16> zeroinitializer)
  %2 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %1, <8 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>)
  %3 = trunc <8 x i16> %2 to <8 x i8>
  ret <8 x i8> %3
}

produce the following:

.LCPI0_0:
        .short  255                             # 0xff
        .short  255                             # 0xff
        .short  255                             # 0xff
        .short  255                             # 0xff
        .zero   2
        .zero   2
        .zero   2
        .zero   2
saturate4:                              # @saturate4
        pxor    xmm1, xmm1
        pmaxsw  xmm0, xmm1
        pminsw  xmm0, xmmword ptr [rip + .LCPI0_0]
        packuswb        xmm0, xmm0
        ret
saturate8:                              # @saturate8
        packuswb        xmm0, xmm0
        ret

The saturate4 function produces extra min/max. I believe the trunc followed by shufflevector is being optimized before the saturating truncation could be detected.

Discovered in rust-lang/portable-simd#369 (comment)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions