This is an improved version of smmalloc a fast and efficient memory allocator designed to handle many small allocations/deallocations in heavy multi-threaded scenarios. The allocator created for usage in applications where the performance is critical such as video games.
Using smmalloc allocator in the .NET environment helps to minimize GC pressure for allocating buffers and avoid using lock-based pools in multi-threaded systems. Modern .NET features such as Span<T>
greatly works in tandem with smmalloc and allows conveniently manage data in native memory blocks.
To build the native library appropriate software is required:
For desktop platforms CMake with GNU Make or Visual Studio.
A managed assembly can be built using any available compiling platform that supports C# 3.0 or higher.
// 8 buckets, 16 MB each, 128 bytes maximum allocation size
SmmallocInstance smmalloc = new SmmallocInstance(8, 16 * 1024 * 1024);
smmalloc.Dispose();
// 4 KB of thread cache for each bucket, hot warmup
smmalloc.CreateThreadCache(4 * 1024, CacheWarmupOptions.Hot);
smmalloc.DestroyThreadCache();
// 64 bytes of a memory block
IntPtr memory = smmalloc.Malloc(64);
smmalloc.Free(memory);
IntPtr[] batch = new IntPtr[32];
// Allocate a batch of memory
for (int i = 0; i < batch.Length; i++) {
batch[i] = smmalloc.Malloc(64);
}
// Release the whole batch
smmalloc.Free(batch);
// Using Marshal
byte data = 0;
for (int i = 0; i < smmalloc.Size(memory); i++) {
Marshal.WriteByte(memory, i, data++);
}
// Using Span
Span<byte> buffer;
unsafe {
buffer = new Span<byte>((byte*)memory, smmalloc.Size(memory));
}
byte data = 0;
for (int i = 0; i < buffer.Length; i++) {
buffer[i] = data++;
}
// Using Marshal
int sum = 0;
for (int i = 0; i < smmalloc.Size(memory); i++) {
sum += Marshal.ReadByte(memory, i);
}
// Using Span
int sum = 0;
foreach (var value in buffer) {
sum += value;
}
// Xor using Vector and Span
if (Vector.IsHardwareAccelerated) {
Span<Vector<byte>> bufferVector = MemoryMarshal.Cast<byte, Vector<byte>>(buffer);
Span<Vector<byte>> xorVector = MemoryMarshal.Cast<byte, Vector<byte>>(xor);
for (int i = 0; i < bufferVector.Length; i++) {
bufferVector[i] ^= xorVector[i];
}
}
// Using Marshal
byte[] data = new byte[64];
// Copy from native memory
Marshal.Copy(memory, data, 0, 64);
// Copy to native memory
Marshal.Copy(data, 0, memory, 64);
// Using Buffer
unsafe {
// Copy from native memory
fixed (byte* destination = &data[0]) {
Buffer.MemoryCopy((byte*)memory, destination, 64, 64);
}
// Copy to native memory
fixed (byte* source = &data[0]) {
Buffer.MemoryCopy(source, (byte*)memory, 64, 64);
}
}
// Define a custom structure
struct Entity {
public uint id;
public byte health;
public byte state;
}
int entitySize = Marshal.SizeOf(typeof(Entity));
int entityCount = 10;
// Allocate memory block
IntPtr memory = smmalloc.Malloc(entitySize * entityCount);
// Create Span using native memory block
Span<Entity> entities;
unsafe {
entities = new Span<Entity>((void*)memory, entityCount);
}
// Do some stuff
uint id = 1;
for (int i = 0; i < entities.Length; i++) {
entities[i].id = id++;
entities[i].health = (byte)(new Random().Next(1, 100));
entities[i].state = (byte)(new Random().Next(1, 255));
}
// Release memory block
smmalloc.Free(memory);
Definitions of warmup options for CreateThreadCache()
function:
CacheWarmupOptions.Cold
warmup not performed for cache elements.
CacheWarmupOptions.Warm
warmup performed for half of the cache elements.
CacheWarmupOptions.Hot
warmup performed for all cache elements.
A single low-level disposable class is used to work with smmalloc.
Contains a managed pointer to the smmalloc instance.
SmmallocInstance(uint bucketsCount, int bucketSize)
creates allocator instance with a memory pool. Size of memory blocks in each bucket increases with a count of buckets. The bucket size parameter sets an initial size of a pooled memory in bytes.
SmmallocInstance.Dispose()
destroys the smmalloc instance and frees allocated memory.
SmmallocInstance.CreateThreadCache(int cacheSize, CacheWarmupOptions warmupOption)
creates thread cache for fast memory allocations within a thread. The warmup option sets pre-allocation degree of cache elements.
SmmallocInstance.DestroyThreadCache()
destroys the thread cache. Should be called before the end of the thread's life cycle.
SmmallocInstance.Malloc(int bytesCount, int alignment)
allocates aligned memory block. Allocation size depends on buckets count multiplied by 16, so the minimum allocation size is 16 bytes. Maximum allocation size using two buckets in a smmalloc instance will be 32 bytes, for three buckets 48 bytes, for four 64 bytes, and so on. The alignment parameter is optional. Returns pointer to a memory block. Returns a pointer to an allocated memory block.
SmmallocInstance.Free(IntPtr memory)
frees memory block. A managed array or pointer to pointers with length can be used instead of a pointer to memory block to free a batch of memory.
SmmallocInstance.Realloc(IntPtr memory, int bytesCount, int alignment)
reallocates memory block. The alignment parameter is optional. Returns a pointer to a reallocated memory block.
SmmallocInstance.Size(IntPtr memory)
gets usable memory size. Returns size in bytes.
SmmallocInstance.Bucket(IntPtr memory)
gets bucket index of a memory block. Returns placement index.