Description
This issue describes planned improvements to Intel architecture (x86, x64) ISA support for .NET 9.
In .NET 8, AVX-512 ISA support was added (see #77034). In .NET 9, this support will be further improved and leveraged for improved performance, especially with expanded libraries utilization of the recently implemented AVX-512 support. Investigations and implementation will start to support the newly announced AVX10.
Libraries work
- (Q4'23) Light up BitArray with Vector512
- (Q4'23) Light up String with Vector512
- (Q4'23) Light up Base64 encode/decode with Vector512
- Consider SIMD/AVX optimization for Tensor ([API Proposal]: Future of Numerics and AI - Provide a downlevel
System.Numerics.Tensors.TensorPrimitives
class #89639)
AVX10
AVX10 is a new set of vector ISA extensions, described here. We expect to begin preliminary work to support AVX10 in .NET 9, at least the parts that most directly map to the already supported AVX-512. An arch-avx10
GitHub label is defined to be added to all related PRs and issues: https://github.com/dotnet/runtime/labels/arch-avx10
- Add VM/JIT AVX10 awareness: CPUID enumeration and detection Adding
Avx10v1
to the runtime #99784 - Propose a new AVX10 API: [API Proposal]: Expose
AVX10
converged vector ISA #98069 - (Q2'24) Do JIT codegen implementation of the API
- (Q2'24) Add AVX10 APIs AVX10.1 API introduction in JIT #101938 Cleanup some handling around Avx10v1 #103241
- Enhance Vector256 codegen with AVX10 instructions (related to what has already been done for AVX512VL) AVX10.1 API introduction in JIT #101938 Cleanup some handling around Avx10v1 #103241
- (Q2'24) Allow additional 16 YMM registers for AVX10 AVX10.1 API introduction in JIT #101938 Cleanup some handling around Avx10v1 #103241
- Allow AVX-512 optimizations for YMM (e.g., scalar conversion, vpternlog) AVX10.1 API introduction in JIT #101938 Cleanup some handling around Avx10v1 #103241
-- The current avx512 optimizations is working for avx10 targets - Identify test plan for .NET 9 sign-off
-- @tannergooding has identified a set of tests and @khushal1996 has successfully completed all stress tests getting expected results.
RyuJIT feature work
- (Q4'23) Enable EVEX embedded rounding support in xarch emitter. Enable EVEX embedded rounding support in xarch emitter #93154
- (Q4'23) Add optimization for scalar/vector conversion of uint32/uint64 to/from packed float/double. Add optimization for scalar/vector conversion of
uint32
/uint64
to/from packedfloat
/double
#80829 - (Q4'23) Finish AVX-512 specific light-up for
Vector128/256/512<T>
Finish Avx512 specific lightup for Vector128/256/512<T> #85207- Accelerating Vector512.Sum() #87851
- Updating Sum() implementation for Vector128 and Vector256 + adding lowering for Vector512 #95568
- All done except for Vector512.Dot, which will be pushed out to future item.
- Add EVEX encoding opmask (k) register masking for per-instruction opmask to xarch emitter. Add EVEX encoding opmask (k) register masking to xarch emitter #80821
RyuJIT optimization work
- AVX512: Fold some bitwise operations to vpternlogq AVX512: Fold some bitwise operations to vpternlogq #84534 (@tannergooding to open a future work item for further optimizations)
- Add optimization for scalar conversion of float/double to ulong Add optimization for scalar conversion of
float
/double
toulong
. #89279
API design work
- Expose System.Runtime.Intrinsics.X86.Avx512F #73604
- (Reconsider implementing?) Expose VectorMask<T> to support generic masking for Vector<T> #74613
- Expose AVX512BW, AVX512CD, and AVX512DQ #76579
Future Work
Some of the planned work for .NET 9 have been pushed out to future work.
Libraries work
- Light up IndexOfAnyAsciiSearcher for AVX512. Light up IndexOfAnyAsciiSearcher for AVX512 #93222 (@MihaZupan)
- Consider SIMD JSON acceleration (Owner: Intel)
- Consider XML API acceleration (Owner: Intel)
- (Help Wanted) Light up Utf8/Utf16 code with Vector512. Light up Utf8Utility.*.cs and Utf16Utility.*.cs with Vector512 code paths. #86119
- (Help Wanted) Light up Ascii.Utility methods with Vector512 code paths. Light up
Ascii.Utility
methods withVector512
code paths. #89280
AVX10
- Allow embedded rounding for YMM/ZMM (related: Enable EVEX embedded rounding support in xarch emitter #93154) (Owner: Intel, Starting AVX10.2)
- (Help Wanted) Convert remaining AVX2 implementations to Vector256 Switching to Vectors from target dependent instrinsics #101251
RyuJIT feature work
-
Vector512.Dot
: AVX-512 specific light-up forVector512.Dot
Finish Avx512 specific lightup for Vector128/256/512<T> #85207
Vector<T>
- Consider
Vector<T>
expanding toVector512<T>
, either automatically or opt-in. (@tannergooding plans to get back to it as a best effort.)
JCC erratum
Debugging / diagnostics work (@BruceForstall)
- AVX-512 debugger support: view registers #87854
- Ensure ELT (enter/leave/tailcall hooks, for profiling) works.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status