Skip to content

[AVX-512] vpternlog can be used even when a subexpression is consumed #137520

Open
@Validark

Description

@Validark

This code: Zig Godbolt LLVM Godbolt

export fn foo(a: @Vector(8, u64), b: @Vector(8, u64), c: @Vector(8, u64)) @Vector(8, u64) {
    const x = a & b;
    const y = a & b & c;
    return x *% y;
}
define dso_local <8 x i64> @foo(<8 x i64> %0, <8 x i64> %1, <8 x i64> %2) local_unnamed_addr {
Entry:
  %3 = and <8 x i64> %1, %0
  %4 = and <8 x i64> %3, %2
  %5 = mul <8 x i64> %4, %3
  ret <8 x i64> %5
}

Compiles to:

        vpandq  zmm0, zmm1, zmm0
        vpandq  zmm1, zmm0, zmm2
        vpmullq zmm0, zmm1, zmm0

Should be:

        vpandq     zmm3, zmm1, zmm0
        vpternlogq zmm2, zmm1, zmm0, 128
        vpmullq    zmm0, zmm2, zmm3

In the current assembly, the second vpandq relies on the input of the first one. vpternlogq, on the other hand, can be computed in parallel to vpandq.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions