Skip to content

Comments

JIT: Devirtualize shared generic virtual methods#123323

Open
hez2010 wants to merge 51 commits intodotnet:mainfrom
hez2010:gvm-devirt-shared
Open

JIT: Devirtualize shared generic virtual methods#123323
hez2010 wants to merge 51 commits intodotnet:mainfrom
hez2010:gvm-devirt-shared

Conversation

@hez2010
Copy link
Contributor

@hez2010 hez2010 commented Jan 18, 2026

Previously in #122023, we hit an issue with GVM devirtualization when the devirtualized target is a shared generic method. GVM calls are imported with a runtime lookup that is specific to the base method. After devirtualization, the call requires the instantiation argument for the implementing method, and the existing lookup cannot be reused.

This PR unblocks devirtualization for shared generic targets by ensuring the call receives the correct instantiation parameter for the devirtualized method:

  • The multiple ad-hoc flags in dvInfo now have been unified into a single instParamLookup

  • When the target does not require a runtime lookup, we already know the exact generic context. We pass the instantiating stub as the inst param (shared with the existing array interface devirtualization path).

  • Store the instantiating stub (when necessary) directly in the exactContext, and devirtualizedMethod now can never be an instantiating stub. Remove the unnecessary getInstantiatedEntry roundtrip.

  • When the target requires a runtime lookup, we now introduced a new DictionaryEntryKind::DevirtualizedMethodDescSlot, and pass it to the instParamLookup so that later the VM knows that it needs to encode the class token from the devirtualized method instead of the original token. And in this case, the devirtualized method pDevirtMD will be passed as template method.

Also due to the instParamLookup change I implement the support for R2R as well.

NativeAOT still needs extra work in JIT to enable GVM devirts.

Example:

IVritualGenericInterface i = new Processor();
Test(i, "test");

VirtualGenericClass c = new Processor();
Test(c, "test");

static void Test<T>(IVritualGenericInterface ifce, T item) where T : notnull
{
    ifce.Process(item);
}

static void Test<T>(VirtualGenericClass baseClass, T item) where T : notnull
{
    baseClass.Process(item);
}

public class Processor : VirtualGenericClass, IVritualGenericInterface
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public override void Process<T>(T item)
    {
        Console.WriteLine(typeof(T));
        Console.WriteLine(item.ToString());
    }
}

Codegen diff:

  G_M27646_IG01:
-       push     rdi
        push     rsi
        push     rbx
-       sub      rsp, 32
+       sub      rsp, 40
-                                                ;; size=7 bbWeight=1 PerfScore 3.25
+                                                ;; size=6 bbWeight=1 PerfScore 2.25
 G_M27646_IG02:
-       mov      rbx, 0xD1FFAB1E      ; Program+Processor
+       mov      rbx, 0xD1FFAB1E      ; 'System.String'
        mov      rcx, rbx
-       call     CORINFO_HELP_NEWSFAST
+       call     [System.Console:WriteLine(System.Object)]
-       mov      rsi, rax
+       mov      rsi, 0xD1FFAB1E      ; 'test'
-       mov      rdi, 0xD1FFAB1E      ; 'test'
        mov      rcx, rsi
-       mov      rdx, 0xD1FFAB1E      ; Program+IVritualGenericInterface
+       call     [System.Console:WriteLine(System.String)]
-       mov      r8, 0xD1FFAB1E      ; token handle
-       call     CORINFO_HELP_VIRTUAL_FUNC_PTR
+       mov      rcx, rbx
+       call     [System.Console:WriteLine(System.Object)]
        mov      rcx, rsi
-       mov      rdx, rdi
+       call     [System.Console:WriteLine(System.String)]
-       call     rax
+       nop
-       mov      rcx, rbx
+                                                ;; size=57 bbWeight=1 PerfScore 13.75
-       call     CORINFO_HELP_NEWSFAST
+G_M27646_IG03:
-       mov      rbx, rax
+       add      rsp, 40
-       mov      rcx, rbx
+       pop      rbx
-       mov      rdx, 0xD1FFAB1E      ; Program+VirtualGenericClass
+       pop      rsi
-       mov      r8, 0xD1FFAB1E      ; token handle
+       ret
-       call     CORINFO_HELP_VIRTUAL_FUNC_PTR
+                                                ;; size=7 bbWeight=1 PerfScore 2.25
-       mov      rcx, rbx
-       mov      rdx, rdi
-       call     rax
-       nop
-                                                ;; size=115 bbWeight=1 PerfScore 14.25
-G_M27646_IG03:
-       add      rsp, 32
-       pop      rbx
-       pop      rsi
-       pop      rdi
-       ret
-                                                ;; size=8 bbWeight=1 PerfScore 2.75

Another example that involves runtime lookup:

MyBase c = new MyImpl<string[]>();
Console.WriteLine(c.Method("hello"));

abstract class MyBase
{
    abstract public T Method<T>(T item) where T : notnull;
}

class MyImpl<U> : MyBase
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public override T Method<T>(T item)
    {
        MyBase b = new MyImpl2();
        b.Method(item);
        return item;
    }
}

class MyImpl2 : MyBase
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public override T Method<T>(T item)
    {
        Console.WriteLine(item.ToString());
        return item;
    }
}

Codegen diff:

  ; Assembly listing for method Program+MyImpl`1[System.__Canon]:Method[System.__Canon](System.__Canon):System.__Canon:this (FullOpts)
 ; Emitting BLENDED_CODE for x64 + VEX on Windows
 ; FullOpts code
 ; optimized code
 ; rsp based frame
 ; partially interruptible
 ; No PGO data
-; 0 inlinees with PGO data; 3 single block inlinees; 0 inlinees without PGO data
+; 0 inlinees with PGO data; 4 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
 ;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def <Program+MyImpl`1[System.__Canon]>
-;  V01 TypeCtx      [V01,T00] (  5,  4.20)    long  ->  rbx         single-def
+;  V01 TypeCtx      [V01,T01] (  4,  4   )    long  ->  rdx         single-def
-;  V02 arg1         [V02,T01] (  4,  4   )     ref  ->  rsi         class-hnd single-def <System.__Canon>
+;  V02 arg1         [V02,T00] (  5,  5   )     ref  ->  rbx         class-hnd single-def <System.__Canon>
 ;  V03 OutArgs      [V03    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
-;  V04 tmp1         [V04,T02] (  3,  6   )     ref  ->  rdi         class-hnd exact single-def "NewObj constructor temp" <Program+MyImpl2>
+;* V04 tmp1         [V04    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact "NewObj constructor temp" <Program+MyImpl2>
 ;* V05 tmp2         [V05    ] (  0,  0   )    long  ->  zero-ref    "spilling helperCall"
-;  V06 tmp3         [V06,T05] (  2,  4   )    long  ->   r8         "argument with side effect"
+;* V06 tmp3         [V06    ] (  0,  0   )     ref  ->  zero-ref    ld-addr-op class-hnd single-def "Inlining Arg" <System.__Canon>
-;  V07 rat0         [V07,T04] (  3,  4   )    long  ->   r8         "runtime lookup"
+;* V07 tmp4         [V07    ] (  0,  0   )  struct ( 8) zero-ref    do-not-enreg[SF] "stack allocated Program+MyImpl2" <Program+MyImpl2>
-;  V08 rat1         [V08,T03] (  3,  5.60)    long  ->   r8         "fgMakeTemp is creating a new local variable"
+;* V08 rat0         [V08,T04] (  0,  0   )    long  ->  zero-ref    "runtime lookup"
+;  V09 rat1         [V09,T02] (  2,  4   )    long  ->  rcx         "spilling expr"
+;* V10 rat2         [V10,T03] (  0,  0   )    long  ->  zero-ref    "fgMakeTemp is creating a new local variable"
 ;
 ; Lcl frame size = 48
 
 G_M37466_IG01:
-       push     rdi
-       push     rsi
        push     rbx
        sub      rsp, 48
        mov      qword ptr [rsp+0x28], rdx
-       mov      rbx, rdx
+       mov      rbx, r8
-       mov      rsi, r8
+                                                ;; size=13 bbWeight=1 PerfScore 2.50
-                                                ;; size=18 bbWeight=1 PerfScore 4.75
 G_M37466_IG02:
-       mov      rcx, 0xD1FFAB1E      ; Program+MyImpl2
+       mov      rcx, qword ptr [rdx+0x48]
-       call     CORINFO_HELP_NEWSFAST
+       cmp      qword ptr [rcx+0x08], 24
-       mov      rdi, rax
+       jle      SHORT G_M37466_IG03
-       mov      rcx, qword ptr [rbx+0x48]
+                                                ;; size=11 bbWeight=1 PerfScore 6.00
-       mov      r8, qword ptr [rcx+0x10]
-       test     r8, r8
-       je       SHORT G_M37466_IG05
-                                                ;; size=31 bbWeight=1 PerfScore 6.75
 G_M37466_IG03:
-       mov      rcx, rdi
+       mov      rcx, rbx
-       mov      rdx, 0xD1FFAB1E      ; Program+MyBase
+       mov      rax, qword ptr [rbx]
-       call     CORINFO_HELP_VIRTUAL_FUNC_PTR
+       mov      rax, qword ptr [rax+0x48]
-       mov      rcx, rdi
+       call     [rax+0x08]System.Object:ToString():System.String:this
-       mov      rdx, rsi
+       mov      rcx, rax
-       call     rax
+       call     [System.Console:WriteLine(System.String)]
-       mov      rax, rsi
+       mov      rax, rbx
-                                                ;; size=29 bbWeight=1 PerfScore 5.25
+                                                ;; size=25 bbWeight=1 PerfScore 10.75
 G_M37466_IG04:
        add      rsp, 48
        pop      rbx
-       pop      rsi
-       pop      rdi
        ret
-                                                ;; size=8 bbWeight=1 PerfScore 2.75
+                                                ;; size=6 bbWeight=1 PerfScore 1.75
-G_M37466_IG05:
+
-       mov      rcx, rbx
+; Total bytes of code 55, prolog size 10, PerfScore 21.00, instruction count 17, allocated bytes for code 55 (MethodHash=7bd26da5) for method Program+MyImpl`1[System.__Canon]:Method[System.__Canon](System.__Canon):System.__Canon:this (FullOpts)
-       mov      rdx, 0xD1FFAB1E      ; global ptr
-       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-       mov      r8, rax
-       jmp      SHORT G_M37466_IG03
-                                                ;; size=23 bbWeight=0.20 PerfScore 0.75
-
-; Total bytes of code 109, prolog size 12, PerfScore 20.25, instruction count 31, allocated bytes for code 109 (MethodHash=7bd26da5) for method Program+MyImpl`1[System.__Canon]:Method[System.__Canon](System.__Canon):System.__Canon:this (FullOpts)
-; ============================================================
-
-; Assembly listing for method Program+MyImpl2:Method[System.__Canon](System.__Canon):System.__Canon:this (FullOpts)
-; Emitting BLENDED_CODE for x64 + VEX on Windows
-; FullOpts code
-; optimized code
-; rsp based frame
-; partially interruptible
-; No PGO data
-; Final local variable assignments
-;
-;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def <Program+MyImpl2>
-;* V01 TypeCtx      [V01    ] (  0,  0   )    long  ->  zero-ref    single-def
-;  V02 arg1         [V02,T00] (  5,  5   )     ref  ->  rbx         ld-addr-op class-hnd single-def <System.__Canon>
-;  V03 OutArgs      [V03    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
-;
-; Lcl frame size = 32
-
-G_M26681_IG01:
-       push     rbx
-       sub      rsp, 32
-       mov      rbx, r8
-                                                ;; size=8 bbWeight=1 PerfScore 1.50
-G_M26681_IG02:
-       mov      rcx, rbx
-       mov      rax, qword ptr [rbx]
-       mov      rax, qword ptr [rax+0x48]
-       call     [rax+0x08]System.Object:ToString():System.String:this
-       mov      rcx, rax
-       call     [System.Console:WriteLine(System.String)]
-       mov      rax, rbx
-                                                ;; size=25 bbWeight=1 PerfScore 10.75
-G_M26681_IG03:
-       add      rsp, 32
-       pop      rbx
-       ret
-                                                ;; size=6 bbWeight=1 PerfScore 1.75
-
-; Total bytes of code 39, prolog size 5, PerfScore 14.00, instruction count 13, allocated bytes for code 39 (MethodHash=630397c6) for method Program+MyImpl2:Method[System.__Canon](System.__Canon):System.__Canon:this (FullOpts)
-; ============================================================

Contributes to #112596

Copilot AI review requested due to automatic review settings January 18, 2026 11:52
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 18, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 18, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables JIT devirtualization for shared generic virtual methods (GVM) that don't require runtime lookups. Previously, all shared GVMs were blocked from devirtualization due to concerns about having the right generic context. This change unblocks devirtualization when the instantiating stub doesn't need a runtime lookup, by checking for the presence of a GT_RUNTIMELOOKUP node before proceeding.

Changes:

  • Introduced needsMethodContext flag to track when a method context is needed for devirtualization
  • For shared generic methods, obtain the instantiating stub and set needsMethodContext = true
  • Unified handling of array interface and generic virtual method devirtualization paths
  • Added runtime lookup check in the JIT to bail out when a lookup is needed but context is unavailable

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/vm/jitinterface.cpp Added logic to detect shared generic methods and obtain instantiating stubs, unified array and GVM devirtualization handling
src/coreclr/jit/importercalls.cpp Updated assertions to allow GVM in AOT scenarios, added runtime lookup check to prevent devirtualization when context is unavailable

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@hez2010
Copy link
Contributor Author

hez2010 commented Jan 18, 2026

@MihuBot

@hez2010
Copy link
Contributor Author

hez2010 commented Jan 18, 2026

Failures seem to be caused by missing context during spmi replay. Otherwise all tests are passing.

Copilot AI review requested due to automatic review settings February 1, 2026 18:06
@hez2010 hez2010 changed the title JIT: Devirtualize shared GVM that doesn't need a runtime lookup JIT: Devirtualize shared generic virtual methods Feb 1, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

@hez2010 hez2010 marked this pull request as draft February 1, 2026 18:23
Copilot AI review requested due to automatic review settings February 1, 2026 19:06
@hez2010
Copy link
Contributor Author

hez2010 commented Feb 23, 2026

This will need some changes to adapt to #124683 (hopefully should be simplifications)

Done in ee80835 (merge) and d87e421(the adaption). It's a great simplification as we no longer need the roundtrip and the weird additional flag which was used to identify whether the arg is coming from DevirtualizedMethodDescSlot or not has gone.

Copilot AI review requested due to automatic review settings February 23, 2026 10:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

Comment on lines 9029 to 9030
GenTree* instParam =
getLookupTree(lookupToken, &dvInfo.instParamLookup, GTF_ICON_METHOD_HDL, compileTimeHandle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookupToken is unused by getLookupTree. I have missed removing it in my change. Can you remove it here and clean up the uses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copilot AI review requested due to automatic review settings February 23, 2026 13:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

helperArg = new MethodWithToken(methodDesc, HandleToModuleToken(ref pResolvedToken), constrainedType, unboxing: false, context: sharedMethod);
Debug.Assert(templateMethod != null);
_compilation.NodeFactory.DetectGenericCycles(MethodBeingCompiled, templateMethod);
helperArg = ComputeMethodWithToken(templateMethod, ref pResolvedToken, constrainedType: null, unboxing: false);
Copy link
Contributor Author

@hez2010 hez2010 Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakobbotsch This is still not sufficient to get things right.

In non-R2R we emit an ELEMENT_TYPE_INTERNAL to use as the owner type, but I don't see how can I do such things in R2R.

Below is a repro that can reproduce the problem (needs to build with composite R2R with JitAggressiveInlining=1):

Details
public class GenericVirtualMethodTests
{
    public static void Main()
    {
        IBaseMethodCaller caller = new ClassBaseCaller(new ClassBase_GenericDerived_NoInlining<string>());
        string value = "repro";
        Equal(value, RuntimeLookupBridgeShared<string>.SameClassDifferentMethod(caller, value));
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Equal<T>(T expected, T actual, [CallerArgumentExpression(nameof(actual))] string testcase = "")
    {
        Console.WriteLine("Validating {0}...", testcase);
        if (!Equals(expected, actual))
        {
            throw new Exception($"Validation failed: expected '{expected}', got '{actual}'.");
        }
    }
}

internal static class RuntimeLookupBridgeShared<TMethod>
{
    public static TMethod SameClassDifferentMethod(IBaseMethodCaller caller, TMethod value)
    {
        return RuntimeLookupDispatcher<string>.SameClassDifferentMethod(caller, value);
    }
}

internal static class RuntimeLookupDispatcher<TContext>
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static TMethod SameClassDifferentMethod<TMethod>(IBaseMethodCaller caller, TMethod value)
    {
        RuntimeLookupVirtualInvoker invoker = new RuntimeLookupVirtualStage<TContext>();
        return invoker.SameClassDifferentMethod(caller, value);
    }
}

internal abstract class RuntimeLookupVirtualInvoker
{
    public abstract T SameClassDifferentMethod<T>(IBaseMethodCaller caller, T value);
}

internal sealed class RuntimeLookupVirtualStage<TContext> : RuntimeLookupVirtualInvoker
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public override T SameClassDifferentMethod<T>(IBaseMethodCaller caller, T value)
    {
        return SameClassDifferentMethodCore(caller, value);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static T SameClassDifferentMethodCore<T>(IBaseMethodCaller caller, T value)
    {
        RuntimeLookupVirtualInvoker invoker = RuntimeLookupTerminalFactory.CreateInvoker();
        return invoker.SameClassDifferentMethod(caller, value);
    }
}

internal static class RuntimeLookupTerminalFactory
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static RuntimeLookupVirtualInvoker CreateInvoker()
    {
        return new RuntimeLookupTerminalInvoker();
    }
}

internal sealed class RuntimeLookupTerminalInvoker : RuntimeLookupVirtualInvoker
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public override T SameClassDifferentMethod<T>(IBaseMethodCaller caller, T value)
    {
        return SameClassDifferentMethodCore(caller, value);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static T SameClassDifferentMethodCore<T>(IBaseMethodCaller caller, T value)
    {
        return caller.Invoke(value);
    }
}

internal interface IBaseMethodCaller
{
    T Invoke<T>(T value);
}

internal sealed class ClassBaseCaller : IBaseMethodCaller
{
    private readonly NonGenericBaseClass _instance;

    public ClassBaseCaller(NonGenericBaseClass instance)
    {
        _instance = instance;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public T Invoke<T>(T value) => _instance.Process(value);
}


internal abstract class NonGenericBaseClass
{
    public abstract T Process<T>(T value);
}

internal sealed class ClassBase_GenericDerived_NoInlining<TDerived> : NonGenericBaseClass
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public override T Process<T>(T value) => value;
}

It will perform a runtime lookup, then it tries to obtain the owner type at

https://github.com/hez2010/runtime/blob/ac7c3e29b9f42662d9d026fed8d2c12516ecbeac/src/coreclr/vm/genericdict.cpp#L898

but we don't actually have one in the signature, so it tries to load the wrong thing (in this case it's trying to load a ELEMENT_TYPE_VAR).

I have disabled the runtime lookup support in R2R in 154f832. Would like to hear what you think about this.

Copilot AI review requested due to automatic review settings February 23, 2026 17:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

@jakobbotsch
Copy link
Member

The output of this program is wrong on coreclr with this PR:

using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;

public class Program
{
    public static void Main()
    {
        Test<string>();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test<T>()
    {
        Derived<T> foo = new();
        foo.Test<object>();
    }

    public class Base<T>
    {
        public virtual void Test<U>()
        {
            Console.WriteLine(typeof(T).FullName);
            Console.WriteLine(typeof(U).FullName);
        }
    }

    public class Derived<T> : Base<List<T>>
    {
    }
}

I am not sure that using the token that we get from the base will work. Under any circumstances it is not the right thing to instantiate the base class type parameters with. Some changes to the signature will be needed to apply the proper modifications coming from the inheritance hierarchy.

@davidwrighton
Copy link
Member

The VM uses a concept called Substitution to make that sort of thing work, but that's at the metadata level, and not typically directly used in this portion of the runtime. For overrides is typically done by starting from an actual MethodTable that is known to have a precise type, and walking up the inheritance chain until we find a MethodTable which HasSameTypeDefAs as the MethodTable which is associated with the method we are actually planning to call.

@davidwrighton
Copy link
Member

Notably, to make this all work, you may need to encode 2 different types in the as part of DevirtualizedMethodDescSlot and use the combination to find the type you want.

@davidwrighton
Copy link
Member

I'm also not entirely sure which scenarios you are trying to optimize here. The nicest examples I see above are entirely removing the virtual dispatch, but I can imagine scenarios where you could be using devirtualization to remove just the virtual lookup operation. This isn't commonly all that useful on regular virtual dispatch, or even in interface dispatch as the dispatch mechanisms are actually quite fast, but in GVM dispatch there is a hashtable involved, which could be removed to somewhat good effect.

@jakobbotsch
Copy link
Member

For overrides is typically done by starting from an actual MethodTable that is known to have a precise type, and walking up the inheritance chain until we find a MethodTable which HasSameTypeDefAs as the MethodTable which is associated with the method we are actually planning to call.

Yeah. I'm not actually sure how we can make this work. We won't have the exact object type available during either resolveVirtualMethod or when generating the shared dictionary entry, and I don't see how we feasibly go from the type parameters of the base method's type to the type parameters of the overriding method's type (only the opposite seems like it would be doable). The JIT does not keep track of the signature that would be required to create a runtime lookup for the exact object type which seems like is what we need here.

The only thing I can think of is a special version of the runtime lookup helper that extracts the exact type from the object and then finds the overriding method table that way. Maybe I am missing something.

@davidwrighton
Copy link
Member

I could see something like... there is an explicit call to newobj in the method being compiler followed by a use of a GVM on that object. In that sort of situation, you could use the token associated with the newobj to define the exact type and then work from there. Similarly, if there is a type which is sealed in a signature, you could do similar tricks, even if you couldn't see the exact type. This is why I'm trying to figure out which scenarios are expected to be optimizable with this change.

@jakobbotsch
Copy link
Member

The hope would be that most GVMs where we know the (possibly shared) exact type would be optimizable by devirtualization. We should be able to compute the exact target and hence avoid the virtual resolution and even do inlining of it, but it seems like computing the instantiation argument is very difficult when the JIT does not know the unshared exact type. (In many cases we know exact unshared types, e.g. under guarded devirtualization that would be the case.)

Perhaps this PR should should focus on the exact unshared types to start out with. Obviously computing the instantiation argument for those is much simpler since no runtime lookup is required.

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 24, 2026

Thanks for the explanation. I'm going to remove the part that tries to compute the runtime lookup and instead only focus on cases where we know the exact type in this PR.

@hez2010
Copy link
Contributor Author

hez2010 commented Feb 24, 2026

I'm also not entirely sure which scenarios you are trying to optimize here. The nicest examples I see above are entirely removing the virtual dispatch, but I can imagine scenarios where you could be using devirtualization to remove just the virtual lookup operation. This isn't commonly all that useful on regular virtual dispatch, or even in interface dispatch as the dispatch mechanisms are actually quite fast, but in GVM dispatch there is a hashtable involved, which could be removed to somewhat good effect.

The scenario I was thinking about is not the lookup itself. It's that by turning the GVM call into a directly call, it enables the JIT to inline the callee despite it still needs a runtime lookup.
After the JIT inlining the callee, it can perform many aggressive optimizations like escape analysis, so that the allocation of unescaped this and those parameters can be eliminated, promoted to registers etc.
We don't have IPA today (and unlikely to have in a foreseeable future due to the non-opt-out-able ReJIT), so many things will have to rely on inlining.
While runtime lookup doesn't operate directly on this (except context from ThisObj), so leaving a runtime lookup there doesn't matter too much in case of such kind of optimizations.

@jakobbotsch
Copy link
Member

jakobbotsch commented Feb 24, 2026

I think it is impossible in general to represent the instantiation argument as a runtime lookup because the JIT does not guarantee that exact shared types have unique instantiations. Consider CallTest in the following:

using System;
using System.Runtime.CompilerServices;

public class Program
{
    public static void Main()
    {
        CallTest<string, object>(true);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void CallTest<T, U>(bool b)
    {
        Base foo = GetBase<T, U>(b);
        foo.Test<object>();
    }

    private static Base GetBase<T, U>(bool b)
    {
        if (b)
        {
            return new Derived<T>();
        }
        else
        {
            return new Derived<U>();
        }
    }

    public class Base
    {
        public virtual void Test<U>()
        {
            Console.WriteLine(typeof(U).FullName);
        }
    }

    public class Derived<T> : Base
    {
        public override void Test<U>()
        {
            Console.WriteLine(typeof(T).FullName);
            Console.WriteLine(typeof(U).FullName);
        }
    }
}

After inlining GetBase the JIT believes it knows the exact type of foo:

    class for 'this' is Program+Derived`1[System.__Canon] [exact] (attrib 20020000)

However, it is not possible to compute the instantiation argument to pass to foo.Test<object> as a runtime lookup. The instantiation argument does not only depend on T, U of CallTest but also on the parameter b. In other words we cannot even hope to be able to create a runtime lookup for the instantiation argument in general.

A few alternatives:

  • Do only the devirtualization when no runtime lookup is required
  • (Less conservative) Skip the devirtualization when the target lives on a shared method table, but allow runtime lookup for the method instantiation portion
  • Introduce a new helper to compute the instantiation argument from the resolved method and the runtime object. It would likely need similar hash table as existing CORINFO_HELP_VIRTUAL_FUNCTION_PTR for the cache, but maybe the resolution would be simpler.

(3) might still be beneficial because we'll be able to inline the target code and in many cases the instantiation argument completely disappears after that. But I would probably start with (1) and perhaps (2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants