Skip to content

optimization of indirect calls #44610

Open
@AndyAyersMS

Description

@AndyAyersMS

With the advent of function pointers we will now see an increased number of indirect call cases that the jit can optimize to direct calls. This requires a transformation similar to the one we do for devirtualization. There may be additional wrinkles, say if someone takes the address of an intrinsic method we would want intrinsic recognition to kick in too. So in the importer perhaps this needs to sit just upstream from impImportCall.

Overall this should be structured in a similar way to devirtualization -- opportunistically transforming calls during importation to enable subsequent inlining, and then perhaps retry later after inlining to at least remove overhead. Would be nice to be able to change over in the optimizer too, but that may prove challenging as we start to bake in many details in morph.

In principle we could do something similar with locally created delegates but there are a number of missing pieces that prevent us from seeing through from delegate creation to invocation.

using System;

class X
{
    static int F() => 33;
   
    public unsafe static int Main()
    {
        delegate*<int> f = &X.F;

        return f() + 67;
    }
}

produces

 Assembly listing for method X:Main():int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 OutArgs      [V00    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;* V01 tmp1         [V01    ] (  0,  0   )    long  ->  zero-ref    "impImportIndirectCall"
;
; Lcl frame size = 40

G_M46779_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; bbWeight=1    PerfScore 0.25
G_M46779_IG02:              ;; offset=0004H
       48B830F3FDD6FC7F0000 mov      rax, 0x7FFCD6FDF330
       FFD0                 call     rax
       83C043               add      eax, 67
                                                ;; bbWeight=1    PerfScore 3.50
G_M46779_IG03:              ;; offset=0013H
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.25

Not sure of the priority of this just yet, I am looking at some apps that use calli fairly heavily and need to better understand how many of these can be optimized. So marking as future for now.

This optimization will also intersect with guarded devirtualization / PGO, provided we can profile indirect targets and see biased distributions.

category:cq
theme:devirtualization
skill-level:expert
cost:medium

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

Status

PGO

Relationships

None yet

Development

No branches or pull requests

Issue actions