-
-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wasted memory in early malloc() #270
Comments
BTW, another bad thing about the !smp_allocator case in std_malloc() is that it doesn't respect the requested alignment: It always returns addresses 8 bytes into a page. |
I have recently spent some time analyzing how memory is allocated and managed in OSv. I added all bunch of metrics and had chance to look at this particular issue. Unless I mismeasured OSv is making on average 1800 (1 thousand 8 hundred) early malloc allocations which amounts to 7.2 MB. What is worse, the total memory requested (size argument passed to std_malloc) for all these single-page allocations in ONLY around 120,000 bytes (~110KB). So the effectiveness ratio is roughly 1.5%. I think the math in the original 4 years ago description is wrong because 819 allocations is roughly 3.2MB. Here is the build command and run command: Here are the changes are made to mempool.cc to capture the number of pages allocated and bytes requested: @@ -1549,11 +1584,28 @@ static inline void* std_malloc(size_t size, size_t alignment)
if (size <= memory::pool::max_object_size && alignment <= size && smp_allocator) {
size = std::max(size, memory::pool::min_object_size);
unsigned n = ilog2_roundup(size);
- ret = memory::malloc_pools[n].alloc();
+ memory::malloc_pool &_pool = memory::malloc_pools[n];
+ memory::malloc_memory_pool_bytes_allocated.fetch_add(_pool.get_size());
+ memory::malloc_memory_pool_bytes_requested.fetch_add(size);
+ ret = _pool.alloc();
ret = translate_mem_area(mmu::mem_area::main, mmu::mem_area::mempool,
ret);
trace_memory_malloc_mempool(ret, size, 1 << n, alignment);
} else if (size <= mmu::page_size && alignment <= mmu::page_size) {
+ if(smp_allocator) {
+ memory::malloc_smp_full_pages_allocated.fetch_add(1);
+ memory::malloc_smp_full_pages_bytes_requested.fetch_add(size);
+ }
+ else {
+ memory::malloc_non_smp_full_pages_allocated.fetch_add(1);
+ memory::malloc_non_smp_full_pages_bytes_requested.fetch_add(size);
+ }
+
ret = mmu::translate_mem_area(mmu::mem_area::main, mmu::mem_area::page,
memory::alloc_page());
trace_memory_malloc_page(ret, size, mmu::page_size, alignment); |
One thing you should confirm before being too happy about the possibility of saving 7 MB of memory, is that most of this memory really remains allocated, and isn't just temporary memory which is immediately freed after being allocated. You can easily check that by also instrumenting the free() code to reduce the counter when called in those early times. I'm curious where these 1800 early allocations come from, and whether we could easily reduce their number significantly. Perhaps we could reuse the code of the leak detector (see include/osv/alloctracker.hh and scripts/loader.py's show_leak) to show a summary of where in the code (in the form of stack backtraces) the early allocations come from. Another option is to use a tracepoint instead of your ad-hoc counting code, and for tracepoints we already have a way to see a list of their occurances, including backtraces. |
That is exactly what I tried to verify. Indeed some of this early allocated memory is freed by end of premain but most stays based on independent places I instrumented the code:
```
OSv v0.51.0-43-gd7acf645
-> arch_setup_free_memory: free memory is 9806 in pages (39224 KB) starting at 0x0000000000b82cf4 (11787 KB)
1 CPUs detected
-> premain end: free memory is 8121 in pages, used 6740 KB
Firmware vendor: SeaBIOS
...
Hello from C code
---------> In non-SMP mode allocated 1810 pages in order to allocate 120060 bytes
---------> In SMP mode allocated 200 pages in order to allocate 360251 bytes
---------> In memory pools allocated 347904 bytes for requested 311031 bytes
-----> free_page_ranges: In malloc_large requested 972 pages and 3981312 bytes (3888 KB)
-----> free_page_ranges: L2 pool allocated 1248 pages and 5111808 bytes (4992 KB)
-> Free memory is 6366 in pages and 26075136 in bytes, used 7020 KB since
end of premain
```
See where it says 'allocated 1810 pages' because it is what I count in std_malloc(). And then I use memory::stats::free in other place "premain end: free memory is 8121 in pages, used 6740 KB" which indicates that 9806 - 8121 = 1685 pages did stay allocated at the end of premain.
But who knows I may have miscalculated something.
|
I added another metric to count early deallocations (in early_free_page):
The math seems to be off though. This suggests that 1810-229=1581 pages should be left whereas the difference reported based on memory::stats - 9806-8121=1685 - is higher. Is it possible that early allocated page gets de-allocated later in SMP-mode. I did play with gdb osv leak but I did not cross-examine against it. |
I found at least one culprit - drivers/acpi.cc:acpi::early_init() function that I have counted made 600 malloc calls. Each call resulted in full page allocated - total 2.4MB. The total data however requested was only 36K.
I used osv leak to track the problem but these particular stack traces never showed up:
When I was debugging on many iterations (over 10 at least) I would see identical stack trace as above. And I do not understand why same malloc from the same place is being called so many times. As if it keeps trying to initialize same semaphore object over and over again. On other hand I saw a loop in AcpiUtMutexInitialize creating 8 mutex objects. I do not think these allocated objects get deallocated in this code so I wonder why osv leak does not capture it. |
Couple of updates to this issue:
and
|
Committed a patch from @wkozaczuk that implements a simplistic but effective allocator for early boot. It saves much more than 0.7MB - in my tests (and in @wkozaczuk 's) a whopping 6MB are saved. |
Our malloc() is wasteful in may places, but here is one I just noticed now.
std_malloc() during early initialization (!smp_allocator) allocates a whole page for every small allocation. Printing out what actually happens in an example run, I saw 819 of these early allocations, most of them asking for 24-72 bytes, but we allocate 4096 bytes for each of them. This is wasting about 0.7 MB of memory. Not huge, but these wasteages add up :(
It would be nice to make the early allocator less wasteful, or alternatively see if we can use the dynamic allocator less during initialization.
The text was updated successfully, but these errors were encountered: