Optimize constant AVX512/AVX2 vectors with broadcast #90328
Open
Description
opened on Aug 10, 2023
Vector512<byte> HexLUT() =>
Vector512.Create("0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"u8);
Produces:
; Method Tests:HexLUT():System.Runtime.Intrinsics.Vector512`1[ubyte]:this (FullOpts)
vzeroupper
vmovups zmm0, zmmword ptr [reloc @RWD00]
vmovups zmmword ptr [rdx], zmm0
mov rax, rdx
vzeroupper
ret
RWD00 dq
3736353433323130h, 6665646362613938h,
3736353433323130h, 6665646362613938h,
3736353433323130h, 6665646362613938h,
3736353433323130h, 6665646362613938h
; Total bytes of code: 26
so it saves the whole thing to data section while it should be smart enough to only store a single lane and broadcast it. It is quite often with AVX2 and AVX512 we still work with 128bit lanes and duplicate constants per lane.
Expected codegen:
; Method Tests:HexLUT():System.Runtime.Intrinsics.Vector512`1[ubyte]:this (FullOpts)
vzeroupper
broadcasti32x4 zmm0, xmmword ptr [reloc @RWD00]
vmovups zmmword ptr [rdx], zmm0
mov rax, rdx
vzeroupper
ret
RWD00 dq
3736353433323130h, 6665646362613938h
Activity