Closed
Description
Currently, RyuJIT inserts VZEROUPPER instruction very conservatively https://github.com/dotnet/coreclr/issues/7761#issuecomment-272287724, which inserts VZEROUPPER at function prologue and epilogue when the method uses 128-bit or 256-bit AVX instructions. Recently, we come across certain scenarios having CQ problems from VZEROUPPER, e.g., https://github.com/dotnet/coreclr/issues/20820, https://github.com/dotnet/coreclr/issues/21055#issuecomment-439465776, etc.
The better insertion strategy that is adopted most of the native compilers is:
- Adding VZEROUPPER instruction after 256-bit AVX instructions are executed (only epilogue, exceptions are if any arguments supplied or return values are in YMM/ZMM registers with
__vectorcall
that we may support in the future). - Adding VZEROUPPER before any function call that might execute legacy SSE code (P/Invoke, calling VM, and AOTed code).
category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium
impact:small