HWIntrinsics: Load folding to immediate address?

I've been taking a go at porting the XXH3 hash algorithm including SSE & AVX versions. My current AVX2 code for the hot loop is here: https://github.com/Zhentar/xxHash3.NET/blob/ee6a626e87f2a829ec786690d4dfa560d876dda7/xxHash3/xxHash3_AVX2.cs#L103

So far I've gotten it up to 36GB/s, against the clang compiled native version's ~40GB/s.

One sub-piece by clang looks like this:
```asm
vmovdqu ymm3, ymmword ptr [rax-360h]
vpaddd  ymm4, ymm3, cs:ymmword_40BDC0
vpshufd ymm6, ymm4, 31h
vpmuludq ymm4, ymm6, ymm4
vpaddq  ymm3, ymm5, ymm3
vpaddq  ymm0, ymm0, ymm3
```

While my version looks like this:
```asm
vmovupd ymm8,ymmword ptr [r10+88h]
vmovupd ymm9,ymmword ptr [r11+360h]
vpaddd  ymm8,ymm9,ymm8
vpshufd ymm10,ymm8,31h
vpmuludq ymm8,ymm8,ymm10
vpaddq  ymm1,ymm9,ymm1
vpaddq  ymm1,ymm8,ymm1
```

Or, if I arrange the code such that folding kicks in (uncommenting the `in` for the `ProcessStripePiece_AVX2` key argument), this:
```asm
lea     r14,[rax+20h]
vmovupd ymm4,ymmword ptr [rbp+100h]
vpaddd  ymm5,ymm4,ymmword ptr [r14]
vpshufd ymm6,ymm5,31h
vpmuludq ymm5,ymm5,ymm6
vpaddq  ymm0,ymm4,ymm0
vpaddq  ymm0,ymm5,ymm0
```

However, the folded version performs worse, because the `lea` competes with the add/shuf/mul instructions for an integer ALU port instead of a load port.

Is there any way to get an immediate address folded into the `vpaddd` instead of an execution time calculated displacement? I've tried a static readonly field, but that still resulted in an lea displacement calculation.		

category:cq
theme:hardware-intrinsics
skill-level:expert
cost:medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HWIntrinsics: Load folding to immediate address? #12308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HWIntrinsics: Load folding to immediate address? #12308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions