Skip to content

Optimize constant AVX512/AVX2 vectors with broadcast  #90328

Closed
@EgorBo

Description

@EgorBo
Vector512<byte> HexLUT() => 
    Vector512.Create("0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"u8);

Produces:

; Method Tests:HexLUT():System.Runtime.Intrinsics.Vector512`1[ubyte]:this (FullOpts)
       vzeroupper 
       vmovups  zmm0, zmmword ptr [reloc @RWD00]
       vmovups  zmmword ptr [rdx], zmm0
       mov      rax, rdx
       vzeroupper 
       ret      
RWD00  	dq	
3736353433323130h, 6665646362613938h, 
3736353433323130h, 6665646362613938h, 
3736353433323130h, 6665646362613938h, 
3736353433323130h, 6665646362613938h
; Total bytes of code: 26

so it saves the whole thing to data section while it should be smart enough to only store a single lane and broadcast it. It is quite often with AVX2 and AVX512 we still work with 128bit lanes and duplicate constants per lane.

Expected codegen:

; Method Tests:HexLUT():System.Runtime.Intrinsics.Vector512`1[ubyte]:this (FullOpts)
       vzeroupper 
       broadcasti32x4 zmm0, xmmword ptr [reloc @RWD00]
       vmovups  zmmword ptr [rdx], zmm0
       mov      rax, rdx
       vzeroupper 
       ret      
RWD00  	dq	
3736353433323130h, 6665646362613938h

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIavx512Related to the AVX-512 architecturehelp wanted[up-for-grabs] Good issue for external contributors

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions