Inefficient codegen for floating-point operations

Older x86 hardware only had the legacy SSE encoding which encodes two parameters: dst/op1, and op2. These instructions are considered RMW since dst and op1 are encoded in the same parameter. In order to handle this, you generally need to insert an additional move instruction if `dst` and `op1` were not determined to be the same by the register allocator.

Newer x86 hardware (anything with AVX support) has the newer VEX encoding which takes three parameters: dst, op1, and op2. This encoding is not RMW and does not require an additional move instruction. The encoding is also more efficient and takes up the same number of bytes to encode (for the same allocated registers) or less bytes when dst != op1 (since you don't need to also encode an additional move instruction).

We are already emitting the VEX encoding by default for floating-point instructions; however codegen for non-intrinsic codepaths are not VEX aware and are still treating floating-point operations as RMW and as if the encoding only supports `dst/op1` and `op2`.

It would be beneficial if the codegen and register allocator were updated to be VEX aware and to call the appropriate `emit_SIMD` codepath (which handles VEX vs legacy encoding differences) where possible.

category:cq
theme:floating-point
skill-level:expert
cost:large

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient codegen for floating-point operations #1342

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient codegen for floating-point operations #1342

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions