Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[net6/Xamarin.iOS] NSObject.ConformsToProtocol is very costly and invoked very often #14065

Closed
jeromelaban opened this issue Feb 3, 2022 · 13 comments · Fixed by #15294
Closed
Labels
enhancement The issue or pull request is an enhancement performance If an issue or pull request is related to performance
Milestone

Comments

@jeromelaban
Copy link
Contributor

Steps to Reproduce

No specific step to reproduce, but it seems that UIView.AddSubview / UIView.RemoveFromSuperview with custom types inheriting from UIView seems to trigger this from time to time.

Expected Behavior

ConformsToProtocol does not appear prominently in performance traces.

Actual Behavior

image

Most of the time is spent here:

var attrs = type.GetCustomAttributes (typeof(ProtocolAttribute), false);

A lot of the time is spent invoking this method at regular times, seemingly in batch, but there's no direct stack trace leading to invocation from managed code.

Environment

VS 2022 17.1 Preview 5

Build Logs

Example Project (If Possible)

I do not have an available project yet.

@spouliot
Copy link
Contributor

spouliot commented Feb 4, 2022

I do not think this would be a specific issue to net6 - but I might have missed some changes so take this with a grain of salt.

but there's no direct stack trace leading to invocation from managed code.

That's Apple's ObjC code calling conformsToProtocol: [1] and since it happens when add/removing views there won't be an interesting managed stack trace.

[1] so there's

  • the associated native->managed->native cost for each call;
  • the number of calls can vary (and likely grows) with each version of iOS (since more protocols are added)

Caching would help some cases but it would help in every cases (at least versus its cost). I doubt it's a good thing to implement caching inside the platform assemblies.

Still you could cache the specific cases that affects you.

  • Set a breakpoint in NSObject.ConformsToProtocol to see which types are queried.
  • Next override ConformsToProtocol and cache the results (IntPtr -> bool).

Or do your own hardcoded logic to return true (or false) without running the heavy weight reflection (and cache that or not).

@jeromelaban
Copy link
Contributor Author

Thanks for the detailed update! Indeed, I do know which types are queried there, and caching should be simple enough for sure.

I wonder, is this a lookup that could be statically generated, or is the pointer passed opaque enough to be useless for caching?

@spouliot
Copy link
Contributor

spouliot commented Feb 4, 2022

I wonder, is this a lookup that could be statically generated

It is, in some cases. However if your app/config is allowing dynamic registration [1] then the querying happens (even if a single GetCustomAttributes will be called anyway).

See src/Foundation/NSObject2.cs around line 477 for details.

[1] and that could be what differ from legacy and net6 profiles in your case.

@rolfbjarne rolfbjarne added the enhancement The issue or pull request is an enhancement label Feb 4, 2022
@rolfbjarne rolfbjarne added this to the Future milestone Feb 4, 2022
@rolfbjarne
Copy link
Member

This is certainly an area where we can optimize the code, but we'd probably need a real-world sample of where it's a problem before spending time trying to fix a particular piece of code that may or may not be the actual problem.

@rolfbjarne rolfbjarne added the performance If an issue or pull request is related to performance label Feb 4, 2022
@jeromelaban
Copy link
Contributor Author

jeromelaban commented Feb 4, 2022

It's actually very curious, and likely something that started happening with iOS 15 (seemingly in relation to _isTransparentFocusItem). Also, caching ConformsToProtocol only marginally improves the overall performance.

Here's a snippet from what I see in instruments:

image

This happens with very large UIView trees.

@spouliot
Copy link
Contributor

spouliot commented Jun 7, 2022

It might not be the exact same case but inside Dope for net6.0-macos the time spent inside ConformsToProtocol (and reflection) happens when removing subviews [1]. That's 2.82 seconds out of a 2m05s benchmark run (so more than 2%).

Screen Shot 2022-06-05 at 2 59 16 PM

The static registrar is not enabled by default for net6.0-macos (which it should be for net6.0-ios applications) and that should make a cache helpful... but there might be a different way (for macOS) to handle this. I can file an issue on Uno for it - but it would best to understand how this affects iOS (with the static registrar) first, so that a fix is beneficial to both platforms.

[1] that subviews removal is somewhat (maybe not everywhere yet) optimized for macOS (using an API not available on iOS) so it could be worse on iOS (at least when the static registrar was not enabled).

@spouliot
Copy link
Contributor

With a cache around ConformsToProtocol this call becomes fast enough not to be part of the trace data.

Here's a newer screenshot of the previous data, this time focusing on RemoveFromSubviews so it's possible to compare with newer data.

Screen Shot 2022-06-09 at 10 02 23 PM

ConformsToProtocol is 3.13 seconds out of 7.85 seconds for RemoveFromSubviews

With the cache enabled...

Screen Shot 2022-06-09 at 10 05 34 PM

well it's 0s (no data) out of 4.82 seconds for RemoveFromSubviews. Note: horizontal scale is automatic, don't trust your eyes 🙄

Now the situation for macOS (numbers above) is a bit different than iOS since removing the dynamic registrar is not very common (beside not being documented) and my attempt for net6 crashed (still need to look why 👀). This makes calls to the reflection parts of ConformsToProtocol much more common.

@Therzok does this slowdown also shows up inside VSfM ?

Benchmark

I made a benchmark to compare RemoveFromSuperview with the uses of a NSMutableArray to show the extra cost.

BenchmarkDotNet=v0.12.1.1528-nightly, OS=macOS 12.4 (21F79) [Darwin 21.5.0]
Apple M1, 1 CPU, 8 logical and 8 physical cores
  [Host] : Mono 6.12.0 (2020-02/dffa5ab9224), Arm64

Job=InProcess  Toolchain=InProcessEmitToolchain  IterationCount=3
LaunchCount=1  WarmupCount=3
Method Length Mean Error StdDev
NSView_AddSubview_RemoveFromSuperview 1 42.65 us 83.92 us 4.600 us
NSMutableArray_Add_Remove 1 35.96 us 34.83 us 1.909 us
NSView_AddSubview_RemoveFromSuperview 16 588.29 us 285.21 us 15.633 us
NSMutableArray_Add_Remove 16 608.13 us 233.19 us 12.782 us
NSView_AddSubview_RemoveFromSuperview 256 11,683.28 us 4,435.93 us 243.148 us
NSMutableArray_Add_Remove 256 12,012.51 us 2,989.70 us 163.876 us
NSView_AddSubview_RemoveFromSuperview 4096 336,094.00 us 94,665.01 us 5,188.908 us
NSMutableArray_Add_Remove 4096 213,205.46 us 51,970.81 us 2,848.695 us

However it's not clear it's being asked about protocols (any or as often) as a real, displayed view would 🧂. I have not adapted the cache to NSObject2.cs (yet) to see the impact.

Benchmark's Source Code

Numbers

Xamarin.Mac Legacy

macOS 12.4 / Legacy / arm64 Build Change Reuse Grid
Xamarin.Mac Legacy (454e16a) 1574.60 440982.29 1950.91 6602.72
Xamarin.Mac Legacy (CtP cache) 1773.51 520282.21 1909.14 6619.84
Impact 1.13x 1.18x 0.98 [1] 1.00x

[1] that's within the noise of the test, sporadic GC pauses

net6.0-macos

macOS 12.4 / net6.0-macos / arm64 Build Change
net-6.0 (w/pinvoke wrappers) 1467.18 385994.96
net-6.0 (plus CtP cache) 1522.31 510197.64
Impact 1.04x 1.32x

Conclusion

IMO that's impressive enough to add it (at least inside Uno's BindableNSView code). The jury is still out (another todo 😉) if it's worth enabling for iOS. If it is I think it's worth having this cache inside the SDK.

FWIW the cache related code could be added under a

if (Runtime.DynamicRegistrationSupported) {
}

so the linker/trimmer would remove it whenever the dynamic registrar is removed, making it a no-cost for most iOS apps.

Finally it's also worth nothing that the calls to ConformsToProtocol comes from the OS itself (and can vary by platforms and/or versions). IOW it's something that the developer has no control over (unless the developer controls the subclass that receive that selector and can override the method).

spouliot added a commit to spouliot/uno that referenced this issue Jun 10, 2022
Details: xamarin/xamarin-macios#14065 (comment)

TODO
* Get numbers for iOS. Since, by default, the dynamic registrar is removed on release builds this won't have a big impact (but it should not regress performance either)
* See if Microsoft is interested in having this cache inside the SDKs
@rolfbjarne
Copy link
Member

One point to have in mind is that it's possible to add protocols to an existing class at runtime... so we might only be able to cache parts of the ConformsToProtocol logic (the part where we use reflection to check if a managed class implements a certain interface / has a certain attribute).

@spouliot
Copy link
Contributor

Good point, there's a class_addProtocol API, but AFAIK there's no matching removal API.

There's two parts that does reflection inside ConformsToProtocol

  • is_wrapper for 3rd party types (640ms in the graph above)
  • calls to IsProtocol (2.05s in the graph above)

So if we can cache only the reflection part we still win back 2/3 of the time.

Or, since the protocols conformance can't be removed, use the cache and fall back to native/selectors when it returns false. That would cache the true that came from the selector call.

@Therzok
Copy link
Contributor

Therzok commented Jun 10, 2022

@spouliot I remember the protocol checks pop up on mono/xammac, but not that often in net6 - but when using dotnet trace (on release builds) I don't see it. It's possible it's inlined?

I guess it's the frame right under RemoveFromSuperview - this is a short trace taken after playing around in the IDE for a bit,

image

@Therzok
Copy link
Contributor

Therzok commented Jun 10, 2022

Is the managed RemoveFromSuperview method called for direct bindings as well?

@spouliot
Copy link
Contributor

@Therzok thanks for looking at it 😄

It's possible it's inlined?

No, it could be removed (by the linker/trimmer) if the dynamic registrar is removed. I thought it was not for VSfM but I'm not 💯% sure about it...

Also it's possible that ConformsToProtocol might not be called from RemoveFromSuperview. It's not clear under which circumstances it is (or not) called and, even if we knew, it could change between OS versions.

Is the managed RemoveFromSuperview method called for direct bindings as well?

Yes, unless the linker/trimmer was able to remove it: like no override and only called from native code. Since we see it inside the trace then it will be called, even for direct bindings.

@Therzok
Copy link
Contributor

Therzok commented Jun 11, 2022

I need to look into why I'm missing so many symbols, these stacks should all have had symbols in them.

spouliot added a commit to spouliot/xamarin-macios that referenced this issue Jun 17, 2022
Note that the call to native code still _always_ happen (not cached) since the application could use `class_addProtocol` to add conformance to a protocol at runtime.

So the cache is limited to the .net specific reflection code that is present (only) when the dynamic registrar is included inside applications. This is the default for macOS apps, but not iOS / tvOS or MacCatalyst apps.

The linker/trimmer will remove the caching code when the dynamic registrar is removed. IOW this PR should not have any impact, performance or size, for most iOS apps (where the dynamic registrar is removed by default).

Fix xamarin#14065

Running Dope on macOS, a 2 minutes benchmark, shows the following times (in seconds and percentage) spent calling this API:

**Before**

* `RemoveFromSuperview` 7.99s (6.4%)
  * `NSObject.ConformsToProtocol` 3.26s (2.6%)

**After**

* `RemoveFromSuperview` 4.67s (3.8%)
  * `NSObject.ConformsToProtocol` 0.32s (.26%)

So a 10x improvements on `ConformsToProtocol` which helps a lot the code path calling `RemoveFromSuperview`.
rolfbjarne pushed a commit that referenced this issue Jun 22, 2022
Note that the call to native code still _always_ happen (not cached) since the application could use `class_addProtocol` to add conformance to a protocol at runtime.

So the cache is limited to the .net specific reflection code that is present (only) when the dynamic registrar is included inside applications. This is the default for macOS apps, but not iOS / tvOS or MacCatalyst apps.

The linker/trimmer will remove the caching code when the dynamic registrar is removed. IOW this PR should not have any impact, performance or size, for most iOS apps (where the dynamic registrar is removed by default).

Fix #14065

Running Dope on macOS, a 2 minutes benchmark, shows the following times (in seconds and percentage) spent calling this API:

## Before

<img width="1051" alt="Screen Shot 2022-06-15 at 9 21 22 PM" src="https://user-images.githubusercontent.com/260465/174201703-a91860a5-ec29-4e19-9de0-5158fd7aafa7.png">

* `RemoveFromSuperview` 7.99s (6.4%)
  * `NSObject.ConformsToProtocol` 3.26s (2.6%)

## After

<img width="1228" alt="Screen Shot 2022-06-15 at 9 24 42 PM" src="https://user-images.githubusercontent.com/260465/174201708-92193e77-ea8e-41bc-9672-bddaaa18a4f6.png">

* `RemoveFromSuperview` 4.67s (3.8%)
  * `NSObject.ConformsToProtocol` 0.32s (.26%)

So a 10x improvements on `ConformsToProtocol` which helps a lot the code path calling `RemoveFromSuperview`.
@ghost ghost locked as resolved and limited conversation to collaborators Jul 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement The issue or pull request is an enhancement performance If an issue or pull request is related to performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants