Skip to content

[RyuJIT] Improve VZEROUPPER insertion #11496

Closed
@fiigii

Description

@fiigii

Currently, RyuJIT inserts VZEROUPPER instruction very conservatively https://github.com/dotnet/coreclr/issues/7761#issuecomment-272287724, which inserts VZEROUPPER at function prologue and epilogue when the method uses 128-bit or 256-bit AVX instructions. Recently, we come across certain scenarios having CQ problems from VZEROUPPER, e.g., https://github.com/dotnet/coreclr/issues/20820, https://github.com/dotnet/coreclr/issues/21055#issuecomment-439465776, etc.

The better insertion strategy that is adopted most of the native compilers is:

  1. Adding VZEROUPPER instruction after 256-bit AVX instructions are executed (only epilogue, exceptions are if any arguments supplied or return values are in YMM/ZMM registers with __vectorcall that we may support in the future).
  2. Adding VZEROUPPER before any function call that might execute legacy SSE code (P/Invoke, calling VM, and AOTed code).

category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium
impact:small

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionshelp wanted[up-for-grabs] Good issue for external contributorsoptimization

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions