Skip to content

WASM startup time optimization tracking issue #63809

Open
@kg

Description

@kg

(old contents of issue migrated to comment below)

During .NET9 development the runtime and blazor teams made improvements to WASM startup time in multiple areas, but more work remains to be done. Some observations and areas to continue to focus on:

  • large amounts of one-off code on the startup path, much of which is cold. in AOT this is less problematic, but the code still has to be loaded from the wasm binary and compiled by the browser.
    • (interpreter) wrappers for things like cctors, synchronization, and initialization
    • method bodies for cctors and startup code
    • finding ways to statically evaluate this initialization code at build time and bake constants into the binary could pay off tremendously here; coreCLR has solutions for this already in specific cases
    • more efficient ways to populate arrays/lists/dictionaries with smaller IL would also be very profitable. some users i.e. Uno generate cctors that populate massive dictionaries and those methods have huge amounts of IL
  • reflection-driven functionality loading metadata
    • metadata parsing is hot in both interp and AOT
    • strcmp and binary/linear searches are hot in both interp and AOT when scanning for methods by name etc
    • NativeAOT has a solution for this where they bake frozen optimized representations of metadata into the binary that can be cheaply utilized with much less initialization work; we could do that too
  • runtime code generation that kicks us into interp
    • if used incorrectly, json/xml serialization can cause this. migrating to source generators avoids it
    • some especially naughty code may be using linq.expressions or s.r.e, so we should be keeping an eye out for it. i've seen dependencies on both pop up, and s.r.e itself has expensive cctors
  • generic instance explosion
    • more metadata decoding/creation
    • more methods to compile in interp
    • more wasm function bodies to load and compile in AOT
    • SIMD is a major culprit here, but general bcl and blazor also have sources of it, i.e. static void RegisterSomething<T> () => SomeList.Add(typeof(T));
    • there are a lot of intrinsics with useless generic parameters that cause an explosion of method instances, i.e. static T Unsafe.NullRef<T> () => null where T is a class. thorough inlining of all relevant intrinsics counteracts the explosion
  • in interpreted mode, interp codegen accounts for 40-60% of total cpu time during startup
    • much of this code only runs a few times and doesn't tier up; we're not bottlenecked on optimization
    • a sizable chunk of this is in the initial IL decoding and basic block building
    • early DCE and early cprop could help a lot here; coreCLR has both and we don't
  • many mono data structures are heavy on malloc/free, which adds up in a thousand-cuts fashion to multiple percentage points of wasted CPU time during startup. i.e. linked lists and ghashtables
    • this adds memory usage overhead as well
    • in some cases we allocate a data structure and then only ever store 0-2 items into it before freeing it
  • strcmp, strlen, and g_str_hash are bad
    • we spend a silly amount of time during startup measuring, hashing, and comparing constant strings over and over, spread across various call sites
  • blazor is missing prefetch directives in its template HTML for key files
  • we currently kick off requests for every dependency all at once during startup, which means less-important requests can block more urgent ones and delay overall startup. ordering these requests and deferring the low-importance ones can allow startup to begin sooner
    • right now we need icudt very early in startup; fixing this would allow us to defer that fairly large download until later
  • memset zeroing is still hot during startup, though we've made progress in this area. in many cases we are zeroing memory that is already known to be pre-zeroed
    • a large chunk of this is due to emscripten and its two allocators (dlmalloc and mimalloc) not knowing how to exploit the fact that wasm sbrk returns zeroed memory
    • our new custom mmap can exploit this, but we need to make comprehensive code changes to take advantage of that
  • in blazor, AOT'd startup is mostly dominated by just running managed code
    • this contributes to interp startup being dominated by codegen
    • historically a lot of this is initializing things like serialization, dependency injection, or routing

Metadata

Metadata

Assignees

Labels

User StoryA single user-facing feature. Can be grouped under an epic.arch-wasmWebAssembly architecturearea-VM-meta-monotenet-performancePerformance related issuetrackingThis issue is tracking the completion of other related issues.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions