Open
Description
Right now allocation requires atomic operations. We should use a thread-local buffer so this isn't required in the common-case.
This would be somewhat difficult to do since we can have multiple running instances. Would we have to use thread_local
? How does the performance overhead of that compare to using atomics?
Maybe we could make SmallObjectCache
a static variable shared between instances. However, if we do that we'd have to differentiate between allocations from different collectors (in for_each and the linked list). This could also result in worse performance if all the different collectors end up messing with each others stuff.
This was much easier when it was just a comment in a config file.......