Skip to content

Unoptimal codegen for "obj is T" with T being struct/sealed #36649

Closed
@Sergio0694

Description

@Sergio0694

Follow up from a question in #1817 (here), cc. @EgorBo.

Description

I think I've identified 4 scenarios where the JIT doesn't produce optimal codegen for an object is T or object is T variable expression, when T is either a struct or a `sealed class.

object is T, when T is a struct (click to expand)
public static bool Is_Slow<T>(object obj) where T : struct
{
    return obj is T;
}
; using T = int
C.Is_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0016
    L0005: mov rax, 0x7ff9d6bdb1e8
    L000f: cmp [rcx], rax
    L0012: je short L0016
    L0014: xor ecx, ecx
    L0016: test rcx, rcx
    L0019: setne al
    L001c: movzx eax, al
    L001f: ret

Note how the JIT creates two separate branches, one per condition (null check and type check). This can be improved by just rewriting the code manually to perform those two checks individually:

public static bool Is_Fast<T>(object obj) where T : struct
{
    return obj != null && obj.GetType() == typeof(T);
}
C.Is_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0019
    L0005: mov rax, 0x7ff9d6bdb1e8
    L000f: cmp [rcx], rax
    L0012: sete al
    L0015: movzx eax, al
    L0018: ret
    L0019: xor eax, eax
    L001b: ret

Here the type check is just done with a cmp + setz, removing one conditional branch entirely.

object is T value, when T is a struct (click to expand)
public static T UnboxOrDefault_Slow<T>(object obj) where T : struct
{
    return (obj is T value) ? value : default;
}
C.UnboxOrDefault_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov rsi, rcx
    L0008: mov rdx, rsi
    L000b: test rdx, rdx
    L000e: je short L0021
    L0010: mov rcx, 0x7ff9d6bdb1e8
    L001a: cmp [rdx], rcx
    L001d: je short L0021
    L001f: xor edx, edx
    L0021: test rdx, rdx
    L0024: je short L0050
    L0026: mov rdx, 0x7ff9d6bdb1e8
    L0030: cmp [rsi], rdx
    L0033: je short L0047
    L0035: mov rdx, rsi
    L0038: mov rcx, 0x7ff9d6bdb1e8
    L0042: call 0x00007ffa366e04a0
    L0047: mov eax, [rsi+8]
    L004a: add rsp, 0x20
    L004e: pop rsi
    L004f: ret
    L0050: xor eax, eax
    L0052: add rsp, 0x20
    L0056: pop rsi
    L0057: ret

In this case the JIT creates 4 branches, two for the is check and 2 for the unbox.any opcode, as the runtime unfortunately still doesn't support/emit the no. prefix. Anyway, here's with explicit code:

public static T UnboxOrDefault_Fast<T>(object obj) where T : struct
{
    return obj != null && obj.GetType() == typeof(T) ? (T)obj : default;
}
C.UnboxOrDefault_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov rsi, rcx
    L0008: test rsi, rsi
    L000b: je short L001c
    L000d: mov rax, 0x7ff9d6bdb1e8
    L0017: cmp [rsi], rax
    L001a: je short L0024
    L001c: xor eax, eax
    L001e: add rsp, 0x20
    L0022: pop rsi
    L0023: ret
    L0024: mov rdx, 0x7ff9d6bdb1e8
    L002e: cmp [rsi], rdx
    L0031: je short L0045
    L0033: mov rdx, rsi
    L0036: mov rcx, 0x7ff9d6bdb1e8
    L0040: call 0x00007ffa366e04a0
    L0045: mov eax, [rsi+8]
    L0048: add rsp, 0x20
    L004c: pop rsi
    L004d: ret

As with the previous case, one less conditional branch and slightly smaller codegen.

object is T, when T is a sealed class (click to expand)
public sealed class Model
{
    public static bool Is_Slow<T>(object obj)
    {
        return obj is Model;
    }
}
Model.Is_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0016
    L0005: mov rax, 0x7ff9ded6cdf0
    L000f: cmp [rcx], rax
    L0012: je short L0016
    L0014: xor ecx, ecx
    L0016: test rcx, rcx
    L0019: setne al
    L001c: movzx eax, al
    L001f: ret

And here is with the manual checks just like the first two cases:

public sealed class Model
{
    public static bool Is_Fast<T>(object obj)
    {
        return obj != null && obj.GetType() == typeof(Model);
    }
}
Model.Is_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0019
    L0005: mov rax, 0x7ff9ded6cdf0
    L000f: cmp [rcx], rax
    L0012: sete al
    L0015: movzx eax, al
    L0018: ret
    L0019: xor eax, eax
    L001b: ret
object is T value, when T is a sealed class (click to expand)
public sealed class Model
{
    public static Model GetOrNull_Slow<T>(object obj)
    {
        if (obj is Model model) return model;
        return null;
    }
}
Model.GetOrNull_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: mov rax, rcx
    L0003: test rax, rax
    L0006: je short L0019
    L0008: mov rdx, 0x7ff9df11ce10
    L0012: cmp [rax], rdx
    L0015: je short L0019
    L0017: xor eax, eax
    L0019: test rax, rax
    L001c: je short L001f
    L001e: ret
    L001f: xor eax, eax
    L0021: ret

As above, one redundant conditional branch. Here is with explicit checks, note I'm using Unsafe.As<T>(object) here to force the JIT not to emit additional checks, as a standard (T) cast would result in worse codegen.

public sealed class Model
{
    public static Model GetOrNull_Fast<T>(object obj)
    {
        if (obj != null && obj.GetType() == typeof(Model))
        {
            return Unsafe.As<Model>(obj);
        }
        return null;
    }
}
Model.GetOrNull_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0018
    L0005: mov rax, 0x7ff9df11ce10
    L000f: cmp [rcx], rax
    L0012: jne short L0018
    L0014: mov rax, rcx
    L0017: ret
    L0018: xor eax, eax
    L001a: ret

Here we once again have one less conditional branch than the one produced by the is operator.

Note: in this last case, we could rewrite the first method as simply returning as Model, which correctly optimizes the final codegen and results in even smaller code size. I figured it was still worth pointing out the missed optimization when writing the code through the is operator though, as devs might very well still use it for a variety of reasons.

There are mainly two potential improvements I'm seeing:

  • One less conditional branch in the "fast" version
  • Slightly smaller codegen (this might in part go away if the method is inlined though)

Configuration

Tested on sharplab.io, in Default, x64 and Roslyn master branches.
All assembly is from the Release configuration.

category:cq
theme:optimization
skill-level:expert
cost:medium
impact:small

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIin-prThere is an active PR which will close this issue when it is mergedoptimizationtenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions