Open
Description
See the detailed discussion:
https://tavianator.com/2025/shlx.html
And some added details:
https://lobste.rs/s/1hbwkk/alder_lake_shlx_anomaly
Basically, when the shift instruction is set using the wrong kind of instruction, it causes a 3x throughput hit.
There should always be other ways to encode this and so it seems worth avoiding in the x86 backend, at least when tuning for Alder Lake or a generic x86 CPU.