Limit remote batch size #724

mjp41 · 2025-01-02T12:00:47Z

When processing a remote batch, the system will process every single message that was available at the start of processing. This can lead to a long pause time if there have been a considerable number of frees to this thread.

This commit introduces a new mechanism to only process messages up to a limit of 1MiB. The limit is configurable using CMake.

Choosing too small a limit can cause freeing to never catch up with the incoming messages.

When processing a remote batch, the system will process every single message that was available at the start of processing. This can lead to a long pause time if there have been a considerable number of frees to this thread. This commit introduces a new mechanism to only process messages up to a limit of 1MiB. The limit is configurable using CMake. Choosing too small a limit can cause freeing to never catch up with the incoming messages.

nwf · 2025-01-02T23:10:16Z

This seems entirely sensible.

Might it be worth having different thresholds for the different times that we are servicing the message queue?

on a relative fast path (say, a small fast free list is empty and we're grabbing the next slab) or
already being on a slow path (say, when we're going to the backend for a new chunk) or
on a really slow path (say, when going even further back to the shared pool)

I could sort of imagine that making the latter of those consume the entire already pending queue but the former be a little more "hope we're in a steady state" flavor of choosy, but that's just an intuition that isn't backed by any kind of data. :)

mjp41 · 2025-01-03T07:54:23Z

@nwf I have been wondering about something similar. If we had a remote per size class per allocator, then we could process the message queues to get a new free list, and only process other sizes if we needed to get a new slab. I think there are some great opportunities for further optimisations here.

nwf · 2025-01-04T00:49:26Z

Oh hey, that's really clever!

Just musing and catching up to things you already know...

It's a middle-ground between "every allocator has one remote" of snmalloc today and "every slab is its own remote" of mimalloc.
The existing send code doesn't need to change to support this, if slab construction puts the appropriate RemoteAllocator* in the Pagemap. (All objects being returned, and the slab being recycled, is a good, natural barrier, barring double-free, on other Allocators not having a copy of the RemoteAllocator* from the Pagemap. It'd be cute to be able to change it during a slab's lifecycle, but probably is more work than it'd be worth.)
After BatchIt, it seems like there's a reasonable chance that the first message we process is big enough to be a fast free list, skipping some of the slab lifecycle machinery (at least in the no-mitigations case).

mjp41 · 2025-01-04T08:04:46Z

The pagemap combines the remote pointer with a size class. I think we could do this, but we'd need the remotes to be at a well defined alignment to be able to read the size class cheaply for various mitigations.

mjp41 · 2025-01-06T09:10:35Z

@nwf I propose we merge this as is, and raise an issue to continue this discussion?

nwf · 2025-01-07T01:49:46Z

Yeah, that sounds like the right plan to me. :)

mjp41 force-pushed the amortized_free branch from 693c299 to cb5b3fd Compare January 2, 2025 12:03

mjp41 force-pushed the amortized_free branch from cb5b3fd to 7c6d2c4 Compare January 2, 2025 12:04

Merge branch 'main' into amortized_free

11562fb

mjp41 mentioned this pull request Jan 7, 2025

Explore: Alter handling of free batches. #728

Open

mjp41 merged commit 046c5ac into microsoft:main Jan 7, 2025
62 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Limit remote batch size #724

Limit remote batch size #724

Uh oh!

mjp41 commented Jan 2, 2025

Uh oh!

nwf commented Jan 2, 2025

Uh oh!

mjp41 commented Jan 3, 2025

Uh oh!

nwf commented Jan 4, 2025

Uh oh!

mjp41 commented Jan 4, 2025

Uh oh!

mjp41 commented Jan 6, 2025

Uh oh!

nwf commented Jan 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Limit remote batch size #724

Limit remote batch size #724

Uh oh!

Conversation

mjp41 commented Jan 2, 2025

Uh oh!

nwf commented Jan 2, 2025

Uh oh!

mjp41 commented Jan 3, 2025

Uh oh!

nwf commented Jan 4, 2025

Uh oh!

mjp41 commented Jan 4, 2025

Uh oh!

mjp41 commented Jan 6, 2025

Uh oh!

nwf commented Jan 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants