Description
Follow up from a question in #1817 (here), cc. @EgorBo.
Description
I think I've identified 4 scenarios where the JIT doesn't produce optimal codegen for an object is T
or object is T variable
expression, when T
is either a struct
or a `sealed class.
object is T, when T is a struct (click to expand)
public static bool Is_Slow<T>(object obj) where T : struct
{
return obj is T;
}
; using T = int
C.Is_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
L0000: test rcx, rcx
L0003: je short L0016
L0005: mov rax, 0x7ff9d6bdb1e8
L000f: cmp [rcx], rax
L0012: je short L0016
L0014: xor ecx, ecx
L0016: test rcx, rcx
L0019: setne al
L001c: movzx eax, al
L001f: ret
Note how the JIT creates two separate branches, one per condition (null
check and type check). This can be improved by just rewriting the code manually to perform those two checks individually:
public static bool Is_Fast<T>(object obj) where T : struct
{
return obj != null && obj.GetType() == typeof(T);
}
C.Is_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
L0000: test rcx, rcx
L0003: je short L0019
L0005: mov rax, 0x7ff9d6bdb1e8
L000f: cmp [rcx], rax
L0012: sete al
L0015: movzx eax, al
L0018: ret
L0019: xor eax, eax
L001b: ret
Here the type check is just done with a cmp
+ setz
, removing one conditional branch entirely.
object is T value, when T is a struct (click to expand)
public static T UnboxOrDefault_Slow<T>(object obj) where T : struct
{
return (obj is T value) ? value : default;
}
C.UnboxOrDefault_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
L0000: push rsi
L0001: sub rsp, 0x20
L0005: mov rsi, rcx
L0008: mov rdx, rsi
L000b: test rdx, rdx
L000e: je short L0021
L0010: mov rcx, 0x7ff9d6bdb1e8
L001a: cmp [rdx], rcx
L001d: je short L0021
L001f: xor edx, edx
L0021: test rdx, rdx
L0024: je short L0050
L0026: mov rdx, 0x7ff9d6bdb1e8
L0030: cmp [rsi], rdx
L0033: je short L0047
L0035: mov rdx, rsi
L0038: mov rcx, 0x7ff9d6bdb1e8
L0042: call 0x00007ffa366e04a0
L0047: mov eax, [rsi+8]
L004a: add rsp, 0x20
L004e: pop rsi
L004f: ret
L0050: xor eax, eax
L0052: add rsp, 0x20
L0056: pop rsi
L0057: ret
In this case the JIT creates 4 branches, two for the is
check and 2 for the unbox.any
opcode, as the runtime unfortunately still doesn't support/emit the no.
prefix. Anyway, here's with explicit code:
public static T UnboxOrDefault_Fast<T>(object obj) where T : struct
{
return obj != null && obj.GetType() == typeof(T) ? (T)obj : default;
}
C.UnboxOrDefault_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
L0000: push rsi
L0001: sub rsp, 0x20
L0005: mov rsi, rcx
L0008: test rsi, rsi
L000b: je short L001c
L000d: mov rax, 0x7ff9d6bdb1e8
L0017: cmp [rsi], rax
L001a: je short L0024
L001c: xor eax, eax
L001e: add rsp, 0x20
L0022: pop rsi
L0023: ret
L0024: mov rdx, 0x7ff9d6bdb1e8
L002e: cmp [rsi], rdx
L0031: je short L0045
L0033: mov rdx, rsi
L0036: mov rcx, 0x7ff9d6bdb1e8
L0040: call 0x00007ffa366e04a0
L0045: mov eax, [rsi+8]
L0048: add rsp, 0x20
L004c: pop rsi
L004d: ret
As with the previous case, one less conditional branch and slightly smaller codegen.
object is T, when T is a sealed class (click to expand)
public sealed class Model
{
public static bool Is_Slow<T>(object obj)
{
return obj is Model;
}
}
Model.Is_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
L0000: test rcx, rcx
L0003: je short L0016
L0005: mov rax, 0x7ff9ded6cdf0
L000f: cmp [rcx], rax
L0012: je short L0016
L0014: xor ecx, ecx
L0016: test rcx, rcx
L0019: setne al
L001c: movzx eax, al
L001f: ret
And here is with the manual checks just like the first two cases:
public sealed class Model
{
public static bool Is_Fast<T>(object obj)
{
return obj != null && obj.GetType() == typeof(Model);
}
}
Model.Is_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
L0000: test rcx, rcx
L0003: je short L0019
L0005: mov rax, 0x7ff9ded6cdf0
L000f: cmp [rcx], rax
L0012: sete al
L0015: movzx eax, al
L0018: ret
L0019: xor eax, eax
L001b: ret
object is T value, when T is a sealed class (click to expand)
public sealed class Model
{
public static Model GetOrNull_Slow<T>(object obj)
{
if (obj is Model model) return model;
return null;
}
}
Model.GetOrNull_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
L0000: mov rax, rcx
L0003: test rax, rax
L0006: je short L0019
L0008: mov rdx, 0x7ff9df11ce10
L0012: cmp [rax], rdx
L0015: je short L0019
L0017: xor eax, eax
L0019: test rax, rax
L001c: je short L001f
L001e: ret
L001f: xor eax, eax
L0021: ret
As above, one redundant conditional branch. Here is with explicit checks, note I'm using Unsafe.As<T>(object)
here to force the JIT not to emit additional checks, as a standard (T)
cast would result in worse codegen.
public sealed class Model
{
public static Model GetOrNull_Fast<T>(object obj)
{
if (obj != null && obj.GetType() == typeof(Model))
{
return Unsafe.As<Model>(obj);
}
return null;
}
}
Model.GetOrNull_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
L0000: test rcx, rcx
L0003: je short L0018
L0005: mov rax, 0x7ff9df11ce10
L000f: cmp [rcx], rax
L0012: jne short L0018
L0014: mov rax, rcx
L0017: ret
L0018: xor eax, eax
L001a: ret
Here we once again have one less conditional branch than the one produced by the is
operator.
Note: in this last case, we could rewrite the first method as simply returning
as Model
, which correctly optimizes the final codegen and results in even smaller code size. I figured it was still worth pointing out the missed optimization when writing the code through theis
operator though, as devs might very well still use it for a variety of reasons.
There are mainly two potential improvements I'm seeing:
- One less conditional branch in the "fast" version
- Slightly smaller codegen (this might in part go away if the method is inlined though)
Configuration
Tested on sharplab.io, in Default
, x64
and Roslyn master
branches.
All assembly is from the Release configuration.
category:cq
theme:optimization
skill-level:expert
cost:medium
impact:small