Skip to content

Inefficient codegen for floating-point operations #1342

Closed
@tannergooding

Description

@tannergooding

Older x86 hardware only had the legacy SSE encoding which encodes two parameters: dst/op1, and op2. These instructions are considered RMW since dst and op1 are encoded in the same parameter. In order to handle this, you generally need to insert an additional move instruction if dst and op1 were not determined to be the same by the register allocator.

Newer x86 hardware (anything with AVX support) has the newer VEX encoding which takes three parameters: dst, op1, and op2. This encoding is not RMW and does not require an additional move instruction. The encoding is also more efficient and takes up the same number of bytes to encode (for the same allocated registers) or less bytes when dst != op1 (since you don't need to also encode an additional move instruction).

We are already emitting the VEX encoding by default for floating-point instructions; however codegen for non-intrinsic codepaths are not VEX aware and are still treating floating-point operations as RMW and as if the encoding only supports dst/op1 and op2.

It would be beneficial if the codegen and register allocator were updated to be VEX aware and to call the appropriate emit_SIMD codepath (which handles VEX vs legacy encoding differences) where possible.

category:cq
theme:floating-point
skill-level:expert
cost:large

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions