Skip to content

Arm64: Implement SVE APIs #99957

Closed
Closed
@kunalspathak

Description

@kunalspathak

Now that all the SVE instructions encoding is completed in #94549, it is time to expose these instructions through .NET APIs. Here is the list of categorized APIs with links to the issue where they were approved.

.NET 9 Goal: We aim to complete SVE APIs in .NET 9. SVE2 APIs will be pushed out to .NET 10.

SVE APIs

High Priority SVE APIs

Sve mask (Complete)

Full list

Sve bitwise (Complete)

Full list

Sve bitmanipulate (Complete)

Full list

Sve loads (Complete)

Full list

Sve stores (Complete)

Full list

Sve maths (Complete)

Full list

Sve counting (Complete)

Full list

Low Priority SVE APIs

Sve scatterstores (Complete)

Full list

Sve gatherloads (Complete)

Full list

Sve fp (Complete)

Full list

Sve firstfaulting (Complete)

Full list

SVE2 APIs

Full list

Sve2 scatterstores

  • Scatter16BitNarrowing
  • Scatter16BitWithByteOffsetsNarrowing
  • Scatter32BitNarrowing
  • Scatter32BitWithByteOffsetsNarrowing
  • Scatter8BitNarrowing
  • Scatter8BitWithByteOffsetsNarrowing
  • ScatterNonTemporal

Sve2 maths

  • AbsoluteDifferenceAdd
  • AbsoluteDifferenceAddWideningLower
  • AbsoluteDifferenceAddWideningUpper
  • AbsoluteDifferenceWideningLower
  • AbsoluteDifferenceWideningUpper
  • AddCarryWideningLower
  • AddCarryWideningUpper
  • AddHighNarowingLower
  • AddHighNarowingUpper
  • AddPairwise
  • AddPairwiseWidening
  • AddSaturate
  • AddSaturateWithSignedAddend
  • AddSaturateWithUnsignedAddend
  • AddWideLower
  • AddWideUpper
  • AddWideningLower
  • AddWideningLowerUpper
  • AddWideningUpper
  • DotProductComplex
  • HalvingAdd
  • HalvingSubtract
  • HalvingSubtractReversed
  • MaxNumberPairwise
  • MaxPairwise
  • MinNumberPairwise
  • MinPairwise
  • MultiplyAddBySelectedScalar
  • MultiplyAddWideningLower
  • MultiplyAddWideningUpper
  • MultiplyBySelectedScalar
  • MultiplySubtractBySelectedScalar
  • MultiplySubtractWideningLower
  • MultiplySubtractWideningUpper
  • MultiplyWideningLower
  • MultiplyWideningUpper
  • PolynomialMultiply
  • PolynomialMultiplyWideningLower
  • PolynomialMultiplyWideningUpper
  • RoundingAddHighNarowingLower
  • RoundingAddHighNarowingUpper
  • RoundingHalvingAdd
  • RoundingSubtractHighNarowingLower
  • RoundingSubtractHighNarowingUpper
  • SaturatingAbs
  • SaturatingDoublingMultiplyAddWideningLower
  • SaturatingDoublingMultiplyAddWideningLowerUpper
  • SaturatingDoublingMultiplyAddWideningUpper
  • SaturatingDoublingMultiplyHigh
  • SaturatingDoublingMultiplySubtractWideningLower
  • SaturatingDoublingMultiplySubtractWideningLowerUpper
  • SaturatingDoublingMultiplySubtractWideningUpper
  • SaturatingDoublingMultiplyWideningLower
  • SaturatingDoublingMultiplyWideningUpper
  • SaturatingNegate
  • SaturatingRoundingDoublingMultiplyAddHigh
  • SaturatingRoundingDoublingMultiplyHigh
  • SaturatingRoundingDoublingMultiplySubtractHigh
  • SubtractHighNarowingLower
  • SubtractHighNarowingUpper
  • SubtractSaturate
  • SubtractSaturateReversed
  • SubtractWideLower
  • SubtractWideUpper
  • SubtractWideningLower
  • SubtractWideningLowerUpper
  • SubtractWideningUpper
  • SubtractWideningUpperLower
  • SubtractWithBorrowWideningLower
  • SubtractWithBorrowWideningUpper

Sve2 mask

  • CreateWhileGreaterThanMask
  • CreateWhileGreaterThanOrEqualMask
  • CreateWhileReadAfterWriteMask
  • CreateWhileWriteAfterReadMask
  • Match
  • NoMatch
  • SaturatingExtractNarrowingLower
  • SaturatingExtractNarrowingUpper
  • SaturatingExtractUnsignedNarrowingLower
  • SaturatingExtractUnsignedNarrowingUpper

Sve2 gatherloads

  • GatherVectorByteZeroExtendNonTemporal
  • GatherVectorInt16SignExtendNonTemporal
  • GatherVectorInt16WithByteOffsetsSignExtendNonTemporal
  • GatherVectorInt32SignExtendNonTemporal
  • GatherVectorInt32WithByteOffsetsSignExtendNonTemporal
  • GatherVectorNonTemporal
  • GatherVectorSByteSignExtendNonTemporal
  • GatherVectorUInt16WithByteOffsetsZeroExtendNonTemporal
  • GatherVectorUInt16ZeroExtendNonTemporal
  • GatherVectorUInt32WithByteOffsetsZeroExtendNonTemporal
  • GatherVectorUInt32ZeroExtendNonTemporal

Sve2 fp

  • AddRotateComplex
  • DownConvertNarrowingUpper
  • DownConvertRoundingOdd
  • DownConvertRoundingOddUpper
  • Log2
  • MultiplyAddRotateComplex
  • MultiplyAddRotateComplexBySelectedScalar
  • ReciprocalEstimate
  • ReciprocalSqrtEstimate
  • SaturatingComplexAddRotate
  • SaturatingRoundingDoublingComplexMultiplyAddHighRotate
  • UpConvertWideningUpper

Sve2 counting

  • CountMatchingElements
  • CountMatchingElementsIn128BitSegments

Sve2 bitwise

  • BitwiseClearXor
  • BitwiseSelect
  • BitwiseSelectLeftInverted
  • BitwiseSelectRightInverted
  • ShiftArithmeticRounded
  • ShiftArithmeticRoundedSaturate
  • ShiftArithmeticSaturate
  • ShiftLeftAndInsert
  • ShiftLeftLogicalSaturate
  • ShiftLeftLogicalSaturateUnsigned
  • ShiftLeftLogicalWideningEven
  • ShiftLeftLogicalWideningOdd
  • ShiftLogicalRounded
  • ShiftLogicalRoundedSaturate
  • ShiftRightAndInsert
  • ShiftRightArithmeticAdd
  • ShiftRightArithmeticNarrowingSaturateEven
  • ShiftRightArithmeticNarrowingSaturateOdd
  • ShiftRightArithmeticNarrowingSaturateUnsignedEven
  • ShiftRightArithmeticNarrowingSaturateUnsignedOdd
  • ShiftRightArithmeticRounded
  • ShiftRightArithmeticRoundedAdd
  • ShiftRightArithmeticRoundedNarrowingSaturateEven
  • ShiftRightArithmeticRoundedNarrowingSaturateOdd
  • ShiftRightArithmeticRoundedNarrowingSaturateUnsignedEven
  • ShiftRightArithmeticRoundedNarrowingSaturateUnsignedOdd
  • ShiftRightLogicalAdd
  • ShiftRightLogicalNarrowingEven
  • ShiftRightLogicalNarrowingOdd
  • ShiftRightLogicalRounded
  • ShiftRightLogicalRoundedAdd
  • ShiftRightLogicalRoundedNarrowingEven
  • ShiftRightLogicalRoundedNarrowingOdd
  • ShiftRightLogicalRoundedNarrowingSaturateEven
  • ShiftRightLogicalRoundedNarrowingSaturateOdd
  • Xor
  • XorRotateRight

Sve2 bitmanipulate

  • InterleavingXorLowerUpper
  • InterleavingXorUpperLower
  • MoveWideningLower
  • MoveWideningUpper
  • VectorTableLookup
  • VectorTableLookupExtension

SveBf16

  • Bfloat16DotProduct
  • Bfloat16MatrixMultiplyAccumulate
  • Bfloat16MultiplyAddWideningToSinglePrecisionLower
  • Bfloat16MultiplyAddWideningToSinglePrecisionUpper
  • ConcatenateEvenInt128FromTwoInputs
  • ConcatenateOddInt128FromTwoInputs
  • ConditionalExtractAfterLastActiveElement
  • ConditionalExtractAfterLastActiveElementAndReplicate
  • ConditionalExtractLastActiveElement
  • ConditionalExtractLastActiveElementAndReplicate
  • ConditionalSelect
  • ConvertToBFloat16
  • CreateFalseMaskBFloat16
  • CreateTrueMaskBFloat16
  • CreateWhileReadAfterWriteMask
  • CreateWhileWriteAfterReadMask
  • DotProductBySelectedScalar
  • DownConvertNarrowingUpper
  • DuplicateSelectedScalarToVector
  • ExtractAfterLastScalar
  • ExtractAfterLastVector
  • ExtractLastScalar
  • ExtractLastVector
  • ExtractVector
  • GetActiveElementCount
  • InsertIntoShiftedVector
  • InterleaveEvenInt128FromTwoInputs
  • InterleaveInt128FromHighHalvesOfTwoInputs
  • InterleaveInt128FromLowHalvesOfTwoInputs
  • InterleaveOddInt128FromTwoInputs
  • LoadVector
  • LoadVector128AndReplicateToVector
  • LoadVector256AndReplicateToVector
  • LoadVectorFirstFaulting
  • LoadVectorNonFaulting
  • LoadVectorNonTemporal
  • Load2xVector
  • Load3xVector
  • Load4xVector
  • PopCount
  • ReverseElement
  • Splice
  • Store
  • StoreNonTemporal
  • TransposeEven
  • TransposeOdd
  • UnzipEven
  • UnzipOdd
  • VectorTableLookup
  • VectorTableLookupExtension
  • ZipHigh
  • ZipLow

SveF32mm

  • MatrixMultiplyAccumulate

SveF64mm

  • ConcatenateEvenInt128FromTwoInputs
  • ConcatenateOddInt128FromTwoInputs
  • InterleaveEvenInt128FromTwoInputs
  • InterleaveInt128FromHighHalvesOfTwoInputs
  • InterleaveInt128FromLowHalvesOfTwoInputs
  • InterleaveOddInt128FromTwoInputs
  • LoadVector256AndReplicateToVector
  • MatrixMultiplyAccumulate

SveFp16

  • Abs
  • AbsoluteCompareGreaterThan
  • AbsoluteCompareGreaterThanOrEqual
  • AbsoluteCompareLessThan
  • AbsoluteCompareLessThanOrEqual
  • AbsoluteDifference
  • Add
  • AddAcross
  • AddPairwise
  • AddRotateComplex
  • AddSequentialAcross
  • CompareEqual
  • CompareGreaterThan
  • CompareGreaterThanOrEqual
  • CompareLessThan
  • CompareLessThanOrEqual
  • CompareNotEqualTo
  • CompareUnordered
  • ConcatenateEvenInt128FromTwoInputs
  • ConcatenateOddInt128FromTwoInputs
  • ConditionalExtractAfterLastActiveElement
  • ConditionalExtractAfterLastActiveElementAndReplicate
  • ConditionalExtractLastActiveElement
  • ConditionalExtractLastActiveElementAndReplicate
  • ConditionalSelect
  • ConvertToDouble
  • ConvertToHalf
  • ConvertToInt16
  • ConvertToInt32
  • ConvertToInt64
  • ConvertToSingle
  • ConvertToUInt16
  • ConvertToUInt32
  • ConvertToUInt64
  • CreateFalseMaskHalf
  • CreateTrueMaskHalf
  • CreateWhileReadAfterWriteMask
  • CreateWhileWriteAfterReadMask
  • Divide
  • DownConvertNarrowingUpper
  • DuplicateSelectedScalarToVector
  • ExtractAfterLastScalar
  • ExtractAfterLastVector
  • ExtractLastScalar
  • ExtractLastVector
  • ExtractVector
  • FloatingPointExponentialAccelerator
  • FusedMultiplyAdd
  • FusedMultiplyAddBySelectedScalar
  • FusedMultiplyAddNegated
  • FusedMultiplySubtract
  • FusedMultiplySubtractBySelectedScalar
  • FusedMultiplySubtractNegated
  • GetActiveElementCount
  • InsertIntoShiftedVector
  • InterleaveEvenInt128FromTwoInputs
  • InterleaveInt128FromHighHalvesOfTwoInputs
  • InterleaveInt128FromLowHalvesOfTwoInputs
  • InterleaveOddInt128FromTwoInputs
  • LoadVector
  • LoadVector128AndReplicateToVector
  • LoadVector256AndReplicateToVector
  • LoadVectorFirstFaulting
  • LoadVectorNonFaulting
  • LoadVectorNonTemporal
  • LoadVectorx2
  • LoadVectorx3
  • LoadVectorx4
  • Log2
  • Max
  • MaxAcross
  • MaxNumber
  • MaxNumberAcross
  • MaxNumberPairwise
  • MaxPairwise
  • Min
  • MinAcross
  • MinNumber
  • MinNumberAcross
  • MinNumberPairwise
  • MinPairwise
  • Multiply
  • MultiplyAddRotateComplex
  • MultiplyAddRotateComplexBySelectedScalar
  • MultiplyAddWideningLower
  • MultiplyAddWideningUpper
  • MultiplyBySelectedScalar
  • MultiplyExtended
  • MultiplySubtractWideningLower
  • MultiplySubtractWideningUpper
  • Negate
  • PopCount
  • ReciprocalEstimate
  • ReciprocalExponent
  • ReciprocalSqrtEstimate
  • ReciprocalSqrtStep
  • ReciprocalStep
  • ReverseElement
  • RoundAwayFromZero
  • RoundToNearest
  • RoundToNegativeInfinity
  • RoundToPositiveInfinity
  • RoundToZero
  • Scale
  • Splice
  • Sqrt
  • Store
  • StoreNonTemporal
  • Subtract
  • TransposeEven
  • TransposeOdd
  • TrigonometricMultiplyAddCoefficient
  • TrigonometricSelectCoefficient
  • TrigonometricStartingValue
  • UnzipEven
  • UnzipOdd
  • UpConvertWideningUpper
  • VectorTableLookup
  • VectorTableLookupExtension
  • ZipHigh
  • ZipLow

SveI8mm

  • DotProductSignedUnsigned
  • DotProductUnsignedSigned
  • MatrixMultiplyAccumulate
  • MatrixMultiplyAccumulateUnsignedSigned

Sha3

  • BitwiseClearXor
  • BitwiseRotateLeftBy1AndXor
  • Xor
  • XorRotateRight

Sm4

  • Sm4EncryptionAndDecryption
  • Sm4KeyUpdates

SveAes

  • AesInverseMixColumns
  • AesMixColumns
  • AesSingleRoundDecryption
  • AesSingleRoundEncryption
  • PolynomialMultiplyWideningLower
  • PolynomialMultiplyWideningUpper

SveBitperm

  • GatherLowerBitsFromPositionsSelectedByBitmask
  • GroupBitsToRightOrLeftAsSelectedByBitmask
  • ScatterLowerBitsIntoPositionsSelectedByBitmask

SveSha3

  • BitwiseRotateLeftBy1AndXor

SveSm4

  • Sm4EncryptionAndDecryption
  • Sm4KeyUpdates

Credits to @a74nh for populating the list and also some files in https://github.com/a74nh/runtime/tree/api_github/sve_api that will help to implement them.

Contributes to #93095

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIarm-sveWork related to arm64 SVE/SVE2 support

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions