Inefficient x64 codegen for float->int truncation

Unfortunately, x64 codegen - at least as employed by v8 - for i32x4.trunc_sat_f32x4_s is really elaborate:

https://github.com/v8/v8/blob/4b9b23521e6fd42373ebbcb20ebe03bf445494f9/src/compiler/backend/ia32/code-generator-ia32.cc#L2083-L2100

This is 7 instructions for what could be 1 instruction in x64 if NaN handling or overflow behavior didn't have to match the specified one.

Is there any way this can be improved? I don't have a specific suggestion, but this costs ~10% of instructions (not sure how to measure cycle impact accurately) on one of my functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient x64 codegen for float->int truncation #173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient x64 codegen for float->int truncation #173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions