Skip to content

Conversation

@tannergooding
Copy link
Member

@tannergooding tannergooding commented Oct 16, 2025

This resolves #120367 and was tested using the following program:

using System.Numerics;
using System.Runtime.Intrinsics;

TestV();
TestV64();
TestV128();
TestV256();
TestV512();

static ushort[] TestV()
{
    Console.WriteLine(Vector<byte>.Count);
    ushort[] numbers = new ushort[32];
    Vector<ushort>.Zero.CopyTo(numbers);
    return numbers;
}

static ushort[] TestV64()
{
    Console.WriteLine(Vector64<byte>.Count);
    ushort[] numbers = new ushort[4];
    Vector64<ushort>.Zero.CopyTo(numbers);
    return numbers;
}

static ushort[] TestV128()
{
    Console.WriteLine(Vector128<byte>.Count);
    ushort[] numbers = new ushort[8];
    Vector128<ushort>.Zero.CopyTo(numbers);
    return numbers;
}

static ushort[] TestV256()
{
    Console.WriteLine(Vector256<byte>.Count);
    ushort[] numbers = new ushort[16];
    Vector256<ushort>.Zero.CopyTo(numbers);
    return numbers;
}

static ushort[] TestV512()
{
    Console.WriteLine(Vector512<byte>.Count);
    ushort[] numbers = new ushort[32];
    Vector512<ushort>.Zero.CopyTo(numbers);
    return numbers;
}

Results

Before

BEFORE
   1: JIT compiled Program:<Main>$(System.String[]) [Tier0, IL size=31, code size=51]
   2: JIT compiled Program:<<Main>$>g__TestV|0_0() [Tier0, IL size=34, code size=92]
   3: JIT compiled System.Threading.Thread:GetThreadStaticsBase() [Tier0, IL size=18, code size=34]
   4: JIT compiled System.Numerics.Vector`1[ushort]:CopyTo(ushort[]) [Tier0, IL size=39, code size=79]
   5: JIT compiled Program:<<Main>$>g__TestV64|0_1() [Tier0, IL size=30, code size=85]
   6: JIT compiled System.Runtime.Intrinsics.Vector64:CopyTo[ushort](System.Runtime.Intrinsics.Vector64`1[ushort],ushort[]) [Tier0, IL size=34, code size=71]
   7: JIT compiled Program:<<Main>$>g__TestV128|0_2() [Tier0, IL size=30, code size=84]
   8: JIT compiled Program:<<Main>$>g__TestV256|0_3() [Tier0, IL size=31, code size=84]
   9: JIT compiled System.Runtime.Intrinsics.Vector256:CopyTo[ushort](System.Runtime.Intrinsics.Vector256`1[ushort],ushort[]) [Tier0, IL size=34, code size=79]
  10: JIT compiled Program:<<Main>$>g__TestV512|0_4() [Tier0, IL size=31, code size=98]
  11: JIT compiled System.Runtime.Intrinsics.Vector512:CopyTo[ushort](System.Runtime.Intrinsics.Vector512`1[ushort],ushort[]) [Tier0, IL size=34, code size=83]

After - .NET 10

AVX512 Capable Machine

1: JIT compiled System.Threading.Thread:GetThreadStaticsBase() [Tier0, IL size=18, code size=34]
   2: JIT compiled System.Numerics.Vector`1[ushort]:CopyTo(ushort[]) [Tier0, IL size=39, code size=79]
   3: JIT compiled System.Runtime.Intrinsics.Vector64:CopyTo[ushort](System.Runtime.Intrinsics.Vector64`1[ushort],ushort[]) [Tier0, IL size=34, code size=71]
   4: JIT compiled System.Runtime.Intrinsics.Vector256:CopyTo[ushort](System.Runtime.Intrinsics.Vector256`1[ushort],ushort[]) [Tier0, IL size=34, code size=79]
   5: JIT compiled Program:<<Main>$>g__TestV512|0_4() [Tier0, IL size=31, code size=98]
   6: JIT compiled System.Runtime.Intrinsics.Vector512:CopyTo[ushort](System.Runtime.Intrinsics.Vector512`1[ushort],ushort[]) [Tier0, IL size=34, code size=83]

AVX512=0

   1: JIT compiled System.Threading.Thread:GetThreadStaticsBase() [Tier0, IL size=18, code size=34]
   2: JIT compiled System.Numerics.Vector`1[ushort]:CopyTo(ushort[]) [Tier0, IL size=39, code size=79]
   3: JIT compiled System.Runtime.Intrinsics.Vector64:CopyTo[ushort](System.Runtime.Intrinsics.Vector64`1[ushort],ushort[]) [Tier0, IL size=34, code size=71]
   4: JIT compiled System.Runtime.Intrinsics.Vector256:CopyTo[ushort](System.Runtime.Intrinsics.Vector256`1[ushort],ushort[]) [Tier0, IL size=34, code size=79]
   5: JIT compiled System.Runtime.Intrinsics.Vector512:CopyTo[ushort](System.Runtime.Intrinsics.Vector512`1[ushort],ushort[]) [Tier0, IL size=34, code size=100]

AVX2=0

   1: JIT compiled Program:<Main>$(System.String[]) [Tier0, IL size=31, code size=51]
   2: JIT compiled Program:<<Main>$>g__TestV|0_0() [Tier0, IL size=34, code size=87]
   3: JIT compiled System.Threading.Thread:GetThreadStaticsBase() [Tier0, IL size=18, code size=34]
   4: JIT compiled Program:<<Main>$>g__TestV64|0_1() [Tier0, IL size=30, code size=85]
   5: JIT compiled System.Runtime.Intrinsics.Vector64:CopyTo[ushort](System.Runtime.Intrinsics.Vector64`1[ushort],ushort[]) [Tier0, IL size=34, code size=71]
   6: JIT compiled Program:<<Main>$>g__TestV128|0_2() [Tier0, IL size=30, code size=84]
   7: JIT compiled Program:<<Main>$>g__TestV256|0_3() [Tier0, IL size=31, code size=84]
   8: JIT compiled System.Runtime.Intrinsics.Vector256:CopyTo[ushort](System.Runtime.Intrinsics.Vector256`1[ushort],ushort[]) [Tier0, IL size=34, code size=79]
   9: JIT compiled Program:<<Main>$>g__TestV512|0_4() [Tier0, IL size=31, code size=126]
  10: JIT compiled System.Runtime.Intrinsics.Vector512:CopyTo[ushort](System.Runtime.Intrinsics.Vector512`1[ushort],ushort[]) [Tier0, IL size=34, code size=100]

SSE42=0

   1: JIT compiled System.SpanHelpers:IndexOfNullCharacter(ptr) [Instrumented Tier0, IL size=1148, code size=926]
   2: JIT compiled System.SpanHelpers:GetCharVector128SpanLength(nint,nint) [Tier0, IL size=14, code size=26]
   3: JIT compiled System.SpanHelpers:NonPackedIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int) [Instrumented Tier0, IL size=1204, code size=1761]
   4: JIT compiled System.Text.Unicode.Utf16Utility:GetPointerToFirstInvalidChar(ptr,int,byref,byref) [Instrumented Tier0, IL size=946, code size=1280]
   5: JIT compiled System.Text.Ascii:GetIndexOfFirstNonAsciiChar_Intrinsified(ptr,nuint) [Instrumented Tier0, IL size=473, code size=1390]
   6: JIT compiled System.Text.Ascii:VectorContainsNonAsciiChar(System.Runtime.Intrinsics.Vector128`1[ushort]) [Tier0, IL size=109, code size=63]
   7: JIT compiled System.Text.Unicode.Utf8Utility:TranscodeToUtf8(ptr,int,ptr,int,byref,byref) [Instrumented Tier0, IL size=1494, code size=2856]
   8: JIT compiled System.Text.Ascii:NarrowUtf16ToAscii(ptr,ptr,nuint) [Instrumented Tier0, IL size=380, code size=755]
   9: JIT compiled Program:<Main>$(System.String[]) [Tier0, IL size=31, code size=51]
  10: JIT compiled Program:<<Main>$>g__TestV|0_0() [Tier0, IL size=34, code size=87]
  11: JIT compiled System.Threading.Thread:GetThreadStaticsBase() [Tier0, IL size=18, code size=34]
  12: JIT compiled System.PackedSpanHelpers:IndexOfAnyInRange[System.SpanHelpers+DontNegate`1[short]](byref,short,short,int) [Instrumented Tier0, IL size=861, code size=1316]
  13: JIT compiled Program:<<Main>$>g__TestV64|0_1() [Tier0, IL size=30, code size=85]
  14: JIT compiled System.Runtime.Intrinsics.Vector64:CopyTo[ushort](System.Runtime.Intrinsics.Vector64`1[ushort],ushort[]) [Tier0, IL size=34, code size=71]
  15: JIT compiled Program:<<Main>$>g__TestV128|0_2() [Tier0, IL size=30, code size=82]
  16: JIT compiled Program:<<Main>$>g__TestV256|0_3() [Tier0, IL size=31, code size=104]
  17: JIT compiled System.Runtime.Intrinsics.Vector256:CopyTo[ushort](System.Runtime.Intrinsics.Vector256`1[ushort],ushort[]) [Tier0, IL size=34, code size=93]
  18: JIT compiled Program:<<Main>$>g__TestV512|0_4() [Tier0, IL size=31, code size=135]
  19: JIT compiled System.Runtime.Intrinsics.Vector512:CopyTo[ushort](System.Runtime.Intrinsics.Vector512`1[ushort],ushort[]) [Tier0, IL size=34, code size=109]

HWIntrinsics=0

   A LOT - Not actually going to list the several hundred APIs, this is expected since the "baseline" ISA is disabled

After - .NET 11

AVX512 Capable Machine

  1: JIT compiled System.Runtime.Intrinsics.Vector64:CopyTo[ushort](System.Runtime.Intrinsics.Vector64`1[ushort],ushort[]) [Tier0, IL size=34, code size=71]
  2: JIT compiled System.Runtime.Intrinsics.Vector256:CopyTo[ushort](System.Runtime.Intrinsics.Vector256`1[ushort],ushort[]) [Tier0, IL size=34, code size=79]
  3: JIT compiled Program:<<Main>$>g__TestV512|0_4() [Tier0, IL size=31, code size=98]
  4: JIT compiled System.Runtime.Intrinsics.Vector512:CopyTo[ushort](System.Runtime.Intrinsics.Vector512`1[ushort],ushort[]) [Tier0, IL size=34, code size=83]

AVX512=0

  1: JIT compiled System.Runtime.Intrinsics.Vector64:CopyTo[ushort](System.Runtime.Intrinsics.Vector64`1[ushort],ushort[]) [Tier0, IL size=34, code size=71]
  2: JIT compiled System.Runtime.Intrinsics.Vector256:CopyTo[ushort](System.Runtime.Intrinsics.Vector256`1[ushort],ushort[]) [Tier0, IL size=34, code size=79]
  3: JIT compiled System.Runtime.Intrinsics.Vector512:CopyTo[ushort](System.Runtime.Intrinsics.Vector512`1[ushort],ushort[]) [Tier0, IL size=34, code size=100]

AVX2=0

  A LOT - This is also expected since we bumped the R2R baseline to AVX2 for .NET 11

Copilot AI review requested due to automatic review settings October 16, 2025 14:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR marks special ABI primitive types (Vector types and Int128/UInt128) as having stable layout to improve R2R (ReadyToRun) performance. The change eliminates unnecessary JIT compilations for vector and Int128 operations when hardware capabilities support them.

  • Removes the vectorAbiIsStable parameter from VectorFieldLayoutAlgorithm constructors and always treats vector types as ABI stable
  • Updates Int128FieldLayoutAlgorithm to mark Int128/UInt128 types as having stable layout on all platforms including ARM
  • Simplifies VectorOfTFieldLayoutAlgorithm by removing conditional ABI stability logic

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
TestTypeSystemContext.cs Removed second parameter from VectorFieldLayoutAlgorithm constructor call
ReadyToRunCompilerContext.cs Removed conditional ABI stability logic and always marks vector types as stable
VectorFieldLayoutAlgorithm.cs Removed vectorAbiIsStable parameter and field, hardcoded LayoutAbiStable to true
Int128FieldLayoutAlgorithm.cs Changed LayoutAbiStable from false to true for Int128/UInt128 types

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 16, 2025
@tannergooding
Copy link
Member Author

CC. @jkotas, @EgorBo

@jkotas
Copy link
Member

jkotas commented Oct 16, 2025

Are these types still blocked for interop (re #120367 (comment) )?

@tannergooding
Copy link
Member Author

@jkotas
Copy link
Member

jkotas commented Oct 16, 2025

AVX512=0
1: JIT compiled System.Runtime.Intrinsics.Vector64:CopyToushort [Tier0, IL size=34, code size=71]
2: JIT compiled System.Runtime.Intrinsics.Vector256:CopyToushort [Tier0, IL size=34, code size=79]
3: JIT compiled System.Runtime.Intrinsics.Vector512:CopyToushort [Tier0, IL size=34, code size=100]

Why are these still getting compiled?

@tannergooding
Copy link
Member Author

tannergooding commented Oct 16, 2025

Why are these still getting compiled?

Nothing triggers them in the S.P.Corelib R2R image, unlike V128<byte>/V128<ushort>, and they aren't [NonVersionable] so they don't cross the corelib boundary.

We only have the R2R entry for CopyTo<__Cannon> for these types. Where-as V128 is used for byte/ushort from within corelib code paths itself.

@tannergooding
Copy link
Member Author

R2RDump on S.P.Corelib reports

	Line 52321: void System.Runtime.Intrinsics.Vector64.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector64`1<__Canon>, __Canon[])
	Line 52329: void System.Runtime.Intrinsics.Vector64.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector64`1<__Canon>, __Canon[])
	Line 52371: void System.Runtime.Intrinsics.Vector64.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector64`1<__Canon>, __Canon[], int)
	Line 52379: void System.Runtime.Intrinsics.Vector64.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector64`1<__Canon>, __Canon[], int)
	Line 52435: void System.Runtime.Intrinsics.Vector64.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector64`1<__Canon>, System.Span`1<__Canon>)
	Line 52444: void System.Runtime.Intrinsics.Vector64.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector64`1<__Canon>, System.Span`1<__Canon>)
	
	Line 13155: void System.Runtime.Intrinsics.Vector128.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector128`1<__Canon>, __Canon[])
	Line 13163: void System.Runtime.Intrinsics.Vector128.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector128`1<__Canon>, __Canon[])
	Line 13205: void System.Runtime.Intrinsics.Vector128.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector128`1<__Canon>, __Canon[], int)
	Line 13213: void System.Runtime.Intrinsics.Vector128.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector128`1<__Canon>, __Canon[], int)
	Line 13269: void System.Runtime.Intrinsics.Vector128.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector128`1<__Canon>, System.Span`1<__Canon>)
	Line 13278: void System.Runtime.Intrinsics.Vector128.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector128`1<__Canon>, System.Span`1<__Canon>)
	Line 15982: void System.Runtime.Intrinsics.Vector128.CopyTo<ushort>(System.Runtime.Intrinsics.Vector128`1<ushort>, ushort[])
	Line 15988: void System.Runtime.Intrinsics.Vector128.CopyTo<ushort>(System.Runtime.Intrinsics.Vector128`1<ushort>, ushort[])
	Line 36641: void System.Runtime.Intrinsics.Vector128.CopyTo<byte>(System.Runtime.Intrinsics.Vector128`1<byte>, System.Span`1<byte>)
	Line 36649: void System.Runtime.Intrinsics.Vector128.CopyTo<byte>(System.Runtime.Intrinsics.Vector128`1<byte>, System.Span`1<byte>)
	
	Line 11088: void System.Runtime.Intrinsics.Vector256.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector256`1<__Canon>, __Canon[])
	Line 11096: void System.Runtime.Intrinsics.Vector256.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector256`1<__Canon>, __Canon[])
	Line 11138: void System.Runtime.Intrinsics.Vector256.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector256`1<__Canon>, __Canon[], int)
	Line 11146: void System.Runtime.Intrinsics.Vector256.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector256`1<__Canon>, __Canon[], int)
	Line 11202: void System.Runtime.Intrinsics.Vector256.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector256`1<__Canon>, System.Span`1<__Canon>)
	Line 11211: void System.Runtime.Intrinsics.Vector256.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector256`1<__Canon>, System.Span`1<__Canon>)

	Line 40862: void System.Runtime.Intrinsics.Vector512.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector512`1<__Canon>, __Canon[])
	Line 40870: void System.Runtime.Intrinsics.Vector512.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector512`1<__Canon>, __Canon[])
	Line 40912: void System.Runtime.Intrinsics.Vector512.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector512`1<__Canon>, __Canon[], int)
	Line 40920: void System.Runtime.Intrinsics.Vector512.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector512`1<__Canon>, __Canon[], int)
	Line 40976: void System.Runtime.Intrinsics.Vector512.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector512`1<__Canon>, System.Span`1<__Canon>)
	Line 40985: void System.Runtime.Intrinsics.Vector512.CopyTo<__Canon>(System.Runtime.Intrinsics.Vector512`1<__Canon>, System.Span`1<__Canon>)

@EgorBo
Copy link
Member

EgorBo commented Oct 16, 2025

Nothing triggers them in the S.P.Corelib

Yeah, we don't use Vector_.Create(Span) and Vector_.CopyTo(Span) Vector APIs in the BCL yet since they come with the bounds checks JIT can't eliminate today when used inside loops and rely on unsafe instead. The actual issue was driven by, literally the first usage of CopyTo in the whole Libraries.

@tannergooding
Copy link
Member Author

I'm going to log an issue for the void System.Runtime.Intrinsics.Vector512.CopyTo<__Canon> R2R entries existing

These seem like cases we can actually avoid since they will always fault and shouldn't appear in code. It'd be more useful to pick the "most common" T for the SIMD types or just not R2R them at all unless used instead.

@tannergooding
Copy link
Member Author

@jkotas, any other questions/comments?

@jkotas
Copy link
Member

jkotas commented Oct 16, 2025

I'm going to log an issue for the void System.Runtime.Intrinsics.Vector512.CopyTo<__Canon> R2R entries existing

Yes, there is more fine-tunning we should do here, but it does not need to block this PR.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@tannergooding
Copy link
Member Author

Logged #120816

@tannergooding tannergooding merged commit 3be0023 into dotnet:main Oct 16, 2025
97 checks passed
@tannergooding tannergooding deleted the fix-120367 branch October 16, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector128.CopyTo discards R2R

3 participants