Skip to content

Conversation

@mjp41
Copy link
Member

@mjp41 mjp41 commented Jan 2, 2025

When processing a remote batch, the system will process every single message that was available at the start of processing. This can lead to a long pause time if there have been a considerable number of frees to this thread.

This commit introduces a new mechanism to only process messages up to a limit of 1MiB. The limit is configurable using CMake.

Choosing too small a limit can cause freeing to never catch up with the incoming messages.

When processing a remote batch, the system will process every single message that was available at the start of processing.
This can lead to a long pause time if there have been a considerable number of frees to this thread.

This commit introduces a new mechanism to only process messages up to a limit of 1MiB. The limit is configurable using CMake.

Choosing too small a limit can cause freeing to never catch up with the incoming messages.
@nwf
Copy link
Contributor

nwf commented Jan 2, 2025

This seems entirely sensible.

Might it be worth having different thresholds for the different times that we are servicing the message queue?

  • on a relative fast path (say, a small fast free list is empty and we're grabbing the next slab) or
  • already being on a slow path (say, when we're going to the backend for a new chunk) or
  • on a really slow path (say, when going even further back to the shared pool)

I could sort of imagine that making the latter of those consume the entire already pending queue but the former be a little more "hope we're in a steady state" flavor of choosy, but that's just an intuition that isn't backed by any kind of data. :)

@mjp41
Copy link
Member Author

mjp41 commented Jan 3, 2025

@nwf I have been wondering about something similar. If we had a remote per size class per allocator, then we could process the message queues to get a new free list, and only process other sizes if we needed to get a new slab. I think there are some great opportunities for further optimisations here.

@nwf
Copy link
Contributor

nwf commented Jan 4, 2025

Oh hey, that's really clever!

Just musing and catching up to things you already know...

  • It's a middle-ground between "every allocator has one remote" of snmalloc today and "every slab is its own remote" of mimalloc.
  • The existing send code doesn't need to change to support this, if slab construction puts the appropriate RemoteAllocator* in the Pagemap. (All objects being returned, and the slab being recycled, is a good, natural barrier, barring double-free, on other Allocators not having a copy of the RemoteAllocator* from the Pagemap. It'd be cute to be able to change it during a slab's lifecycle, but probably is more work than it'd be worth.)
  • After BatchIt, it seems like there's a reasonable chance that the first message we process is big enough to be a fast free list, skipping some of the slab lifecycle machinery (at least in the no-mitigations case).

@mjp41
Copy link
Member Author

mjp41 commented Jan 4, 2025

The pagemap combines the remote pointer with a size class. I think we could do this, but we'd need the remotes to be at a well defined alignment to be able to read the size class cheaply for various mitigations.

@mjp41
Copy link
Member Author

mjp41 commented Jan 6, 2025

@nwf I propose we merge this as is, and raise an issue to continue this discussion?

@nwf
Copy link
Contributor

nwf commented Jan 7, 2025

Yeah, that sounds like the right plan to me. :)

@mjp41 mjp41 merged commit 046c5ac into microsoft:main Jan 7, 2025
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants