Skip to content

memset consumes a large amount of time during startup for applications that use mmap #21620

Open
@kg

Description

@kg

During application startup, operations like mmap will perform a memset to properly zero pages before returning them to the calling application:

memset(ptr, 0, alloc_len);

If bulk memory is enabled, this is fine since that should (if the JS/wasm runtime is doing the right thing) just invoke a native implementation of the bulk fill operation. EDITED: While bulk memory memset is faster, memset is still a massive bottleneck in this scenario.

If bulk memory isn't enabled, what appears to happen is that since we're starting up, it's quite likely that memset and memcpy operations will run in an interpreter instead of in fully jitted WASM code, like in this profile: (note: the elapsed time looks worse than it is, because I profiled multiple app starts in a loop)
image
Presumably if other parts of your startup take long enough, tiered compilation in your browser of choice will have completed by the point you start calling mmap, and this won't happen. For us, it consistently all happens before that point.

I know in some cases wasm will just always run in an interpreter - i.e. lockdown modes, or iOS when its jitcode memory block is full. So in those cases also this could be pretty impactful during the entirety of an app's runtime, but I would expect bulk memory to fix it there too.

sbc100 mentioned that bulk memory should be default soon, which might make this issue no longer relevant. Just figured I'd bring it up in case it seemed worthwhile to make a 1-2 line change to the emscripten libc to i.e. always use emscripten_memset_js (which doesn't exist right now, I guess) in operations like mmap where it could matter.

I'll also note that during startup lots of this memory is already pre-zeroed, since it came from sbrk at the bottom of the stack, and it looks like in some cases it also comes from mi_heap_malloc_zero under the covers. So in those scenarios there's no point in doing the memset at all - but flowing that information all the way up the call stack into mmap isn't an easy ask, so I'm not surprised that it's not happening.

In general (in part due to the fact that memset is running in the interpreter instead of native code) memset and memcpy are a surprisingly large % time slice of our application startup :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions