Skip to content

Optimize constant AVX512/AVX2 vectors with broadcast  #90328

Open
@EgorBo

Description

Vector512<byte> HexLUT() => 
    Vector512.Create("0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"u8);

Produces:

; Method Tests:HexLUT():System.Runtime.Intrinsics.Vector512`1[ubyte]:this (FullOpts)
       vzeroupper 
       vmovups  zmm0, zmmword ptr [reloc @RWD00]
       vmovups  zmmword ptr [rdx], zmm0
       mov      rax, rdx
       vzeroupper 
       ret      
RWD00  	dq	
3736353433323130h, 6665646362613938h, 
3736353433323130h, 6665646362613938h, 
3736353433323130h, 6665646362613938h, 
3736353433323130h, 6665646362613938h
; Total bytes of code: 26

so it saves the whole thing to data section while it should be smart enough to only store a single lane and broadcast it. It is quite often with AVX2 and AVX512 we still work with 128bit lanes and duplicate constants per lane.

Expected codegen:

; Method Tests:HexLUT():System.Runtime.Intrinsics.Vector512`1[ubyte]:this (FullOpts)
       vzeroupper 
       broadcasti32x4 zmm0, xmmword ptr [reloc @RWD00]
       vmovups  zmmword ptr [rdx], zmm0
       mov      rax, rdx
       vzeroupper 
       ret      
RWD00  	dq	
3736353433323130h, 6665646362613938h

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIavx512Related to the AVX-512 architecturehelp wanted[up-for-grabs] Good issue for external contributors

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions