Open
Description
Our profiles show that AddScaleSU
which should be faster with AVX is actually slower and takes more time. This can be confirmed running our CpuMathBenchmarks:
..\..\Tools\dotnetcli\dotnet.exe run -c Release-Intrinsics -- -f *.AddScaleSU --join
Type | Method | Mean |
---|---|---|
AvxPerformanceTests | AddScaleSU | 4.012 ms |
NativePerformanceTests | AddScaleSU | 2.966 ms |
SsePerformanceTests | AddScaleSU | 2.916 ms |
This issue has been spotted by @eerhardt in August #691 (comment)
@helloguo suggested #691 (comment) that GatherVector256
intrinsic should be used