Closed
Description
With #80814, we achieved functional parity of Vector512<T>
with Vector128<T>
and Vector256<T>
. However, there are some new instructions available in Avx512 capable hardware that will allow additional hardware acceleration opportunities for all three types.
This includes:
- ConvertToDouble() -
vcvtqq2pd
&vcvtuqq2pd
- ConvertToInt64() -
vcvtpd2qq
- ConvertToUInt32() -
vcvtps2udq
- ConvertToUInt64() -
vcvtpd2uqq
- ConditionalSelect() -
vpternlog
- Shuffle() -
vpermi2*
,vpermt2*
, etc
We should also ensure that all APIs are accelerated as intrinsic, where applicable, in particular the following are still managed fallbacks (but accelerated):
- Vector512.Dot()
- Vector512.Sum()
There may be others as well, so a general audit to validate would be good.