-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Update emmalloc #10145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update emmalloc #10145
Conversation
595e357
to
56e31e3
Compare
Restored |
@@ -6,7 +6,6 @@ | |||
var LibraryPThread = { | |||
$PThread__postset: 'if (!ENVIRONMENT_IS_PTHREAD) PThread.initMainThreadBlock(); else PThread.initWorker();', | |||
$PThread__deps: ['$PROCINFO', '_register_pthread_ptr', | |||
'emscripten_main_thread_process_queued_calls', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split out this change along with src/deps_info.json? Seems unrelated but maybe I'm wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only started failing in this branch because of the deps structure changing.
// allocated with one of the emmalloc memory allocation functions (malloc, memalign, ...). | ||
// If called with size == 0, the pointer ptr is freed, and a null pointer is returned. If | ||
// called with null ptr, a new pointer is allocated. | ||
void *realloc(void *ptr, size_t size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to re-declare all the POSIX names here? Would it be better to use the ones from stdlib.h and only declare the specific emmalloc symbols in this header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more documentative to re-declare them here. No harm in doing so, in fact, if there ever were a change in signature of one of the names, then this would loudly complain, and it'd trigger to change the signature of the corresponding emmalloc_
name as well.
system/include/emscripten/heap.h
Outdated
// Returns 1 on success, 0 otherwise. In order to call this function, | ||
// the page must have been compiled with -s ALLOW_MEMORY_GROWTH=1 linker | ||
// flag. | ||
int emscripten_realloc_buffer(size_t requested_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear about heap growth vs growing the underlying wasm memory. Are they always the same thing?
Does emscripten_resize_heap
above also require ALLOW_MEMORY_GROWTH
? I'm a little confused a bout the use of the term "realloc" which normally implies the buffer could be moved. Maybe there could be a better name for this function? Is it basically like emscripten_resize_heap
bug taking an absolute size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function doesn't seem very well named perhaps we can avoid exposing it and keep it as an implementation detail. I see its only used in test code anyway. Could the test be modified to not depend on the this currently internal function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to be able to use emscripten_realloc_buffer
in public code to resize the heap. In JS code we can use that to actually shrink the heap and release memory back to the browser.
emscripten_resize_heap
only grows the heap with geometric/linear overgrowth, suitable for sbrk()-like allocators. emscripten_realloc_buffer
is suitable for someone who is implementing their own malloc()
that does not use sbrk()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current emscripten_resize_heap seems to take an absolute number of bytes not an increment. The fact that it can't shrink is a current implementation detail isn't it? With wasm IIUC there it is never possible to shrink memory I doubt that will ever change.
You are renaming the parameter to requested_growth_bytes
but I don't see the implementation changing.
I think the terms heap
and buffer
and realloc
are all a little confusing. IIUC what is really being resized here is the webassembly memory which hold static data, stack and heap.
Maybe we can come up with better names for these and some documentation. Do we really need two different memory resize functions? If we do shouldn't they be called something like:
emscripten_memory_grow
emscripten_memory_resize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are renaming the parameter to
requested_growth_bytes
but I don't see the implementation changing.
Thanks, good catch, when doing a documentation pass I misremembered the function argument behavior. Updated.
Do we really need two different memory resize functions?
Yes - while the automatic overgrowing is useful for some applications, for others it gets in the way if they don't want to do sbrk() or similar. I like the names emscripten_memory_grow()
and emscripten_memory_resize
, let's go with those. @kripken wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on another thought, I don't want to grow this PR and start renaming functions here, that should be for later. Note that the implementation here does not add any new heap resize/grow functionality, but just exposes the function that has already existed for a long time.
@@ -0,0 +1,24 @@ | |||
mergeInto(LibraryManager.library, { | |||
emmalloc_unclaimed_heap_memory__deps: ['emscripten_get_sbrk_ptr'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also looks like it just used in tests. Is it worth added this JS library just for sake of a test case? Maybe there is some way to avoid it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to expose that function to public API for users of emmalloc, hence testing that it works in that test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't that function internal-only right now? So the fact that it has a bad name is currently not an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this should be split out into a separate PR btw. Although I'm still not convinced about the need to shrink memory (since its not possible with wasm).
…alloc memory statistics.
…etting optimized out in -O2 and higher
…sm backend sizes by 90 bytes, but improves wasm backend wasm sizes by 100 bytes, so that's ok.
I'm seeing local failures and some CI failures in Locally:
CI: https://app.circleci.com/jobs/github/emscripten-core/emscripten/227658 Its strange because the CI for this change seems to have passed .. |
@@ -35,6 +35,8 @@ | |||
} | |||
|
|||
void *sbrk(intptr_t increment) { | |||
// Enforce preserving a minimal 4-byte alignment for sbrk. | |||
increment = (increment + 3) & ~3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the use case of sbrk(0)
being used to find the current break point, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no wait, it's still zero then, nm...
@@ -146,7 +146,7 @@ void realloc() { | |||
stage("realloc0"); | |||
emmalloc_blank_slate_from_orbit(); | |||
for (int i = 0; i < 2; i++) { | |||
char* ptr = (char*)malloc(10); | |||
char* ptr = (char*)malloc(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this changed?
Changing assert(size_t(third) == size_t(first) + ((100 + ALLOCATION_UNIT - 1)&(-ALLOCATION_UNIT)) + ALLOCATION_UNIT); // allocation units are multiples of ALLOCATION_UNIT
|
Seems like diff --git a/system/lib/emmalloc.cpp b/system/lib/emmalloc.cpp
index 417ae3f33..27d35dd39 100644
--- a/system/lib/emmalloc.cpp
+++ b/system/lib/emmalloc.cpp
@@ -40,6 +40,7 @@
* malloc.
*/
+#include <stddef.h>
#include <stdint.h>
#include <unistd.h>
#include <memory.h>
@@ -61,7 +62,7 @@ extern "C"
// Configuration: specifies the minimum alignment that malloc()ed memory outputs. Allocation requests with smaller alignment
// than this will yield an allocation with this much alignment.
-#define MALLOC_ALIGNMENT 8
+#define MALLOC_ALIGNMENT __alignof__(max_align_t)
// Configuration: If EMMALLOC_USE_64BIT_OPS is specified, emmalloc uses 64 buckets for free memory regions instead of just 32.
// When building to target asm.js/wasm2js, 64-bit ops are disabled, but in Wasm builds, 64-bit ops are enabled. (this is
diff --git a/tests/core/test_emmalloc.cpp b/tests/core/test_emmalloc.cpp
index 4f86f2f25..9730bb1a2 100644
--- a/tests/core/test_emmalloc.cpp
+++ b/tests/core/test_emmalloc.cpp
@@ -38,7 +38,7 @@ void stage(const char* name) {
}, name);
}
-const size_t ALLOCATION_UNIT = 8;
+#define ALLOCATION_UNIT SMALLEST_ALLOCATION_SIZE
void basics() {
stage("basics");
@@ -58,7 +58,7 @@ void basics() {
assert(size_t(third) == size_t(first) + ((100 + ALLOCATION_UNIT - 1)&(-ALLOCATION_UNIT)) + ALLOCATION_UNIT); // allocation units are multiples of ALLOCATION_UNIT
stage("allocate 10 more");
void* four = malloc(10);
- assert(size_t(four) == size_t(third) + (2*ALLOCATION_UNIT) + ALLOCATION_UNIT); // payload (10 = 2 allocation units) and metadata
+ assert(size_t(four) == size_t(ALIGN_UP((char*)(third) + (2*ALLOCATION_UNIT) + REGION_HEADER_SIZE, MALLOC_ALIGNMENT))); // payload (10 = 2 allocation units) and metadata
stage("free the first");
free(second);
stage("several temp alloc/frees"); However, the |
This is currently breaking the auto-roller: Can we revert and reland after investigation? |
#10183 should resolve the builder issues. I am not sure why, but the dynamic top sizes end up being slightly different for different systems. Perhaps static data section sizes end up being different on different runs(?) |
Thanks for the quick fix! The non-determinism there is pretty strange. Did you ever see it locally, or is it just between different machines? Is it worth opening a bug and trying to track it down? |
I only saw it on the CI - my first suspect would be \r\n vs \n line endings, so --output-eol might help diagnose there. Though not completely sure. It might be worth a bug entry. Btw, do you know what would need to be done to make emmalloc compatible with ASan or LSan? That would be quite interesting I think. |
@Akaricchi : dropped that whole part of the test, it was testing invariants that were specific to previous emmalloc version and no longer hold with the updated emmalloc, so that code was good to go away. |
* Update emmalloc * Use 64bit ops in emmalloc. * Add emmalloc.h header * Ensure test_setjmp_noleak tests against dlmalloc * Fix emmalloc test and 64-bit ctz * Update dlfcn test and aligned_alloc linkage * Adjust last resort lookup for 64-bit buckets build * Remove support for non-sbrk emmalloc use. * Add emmalloc_usable_size(). * Revise docs * Add validation functions and expand emmalloc api. * Add use of EMMALLOC_DEBUG and EMMALLOC_DEBUG_LOG. Add testing for emmalloc memory statistics. * Remove old emmalloc code. * Restore EMMALLOC_USE_64BIT_OPS * Be diligent at looking at all available memory if sbrk() fails. * Add emmalloc_unclaimed_heap_memory(). * Implement malloc_trim() and promote emscripten_realloc_buffer() to a public API. * Micro-optimize compute_free_list_bucket(). * Fix bad order of 32-bit and 64-bit code when looking for free memory buckets * Add missing include. * flake * Fix tests * Stop using LLVM attribute alias(xxx), as it does not link properly to JS code * Remove incorrect JS->C dependency in PThread. * Use emmalloc_trim() * Update emmalloc_trim() test * Update tests. * Restore dlmalloc as default allocator. * Fix use of 32-bit emmalloc in wasm backend wasm2js mode. * Fix emmalloc build * Fix test_emmalloc back to the original form to avoid malloc() calls getting optimized out in -O2 and higher * CI fix * Micro-optimize size * Move emmalloc tests * Fix upstream wasm test * Drop malloc impl if building with USES_DYNAMIC_ALLOC=0 * Make emscripten_realloc_buffer internal again * Update test_emmalloc_trim * Update test_minimal_runtime_code_size. Emmalloc regresses fastcomp wasm backend sizes by 90 bytes, but improves wasm backend wasm sizes by 100 bytes, so that's ok. * Update minimal runtime code size tests * Run tests only in debug * Work around bug emscripten-core#10173 * Update test
This PR updates our emmalloc implementation. The basic idea is the same as original dlmalloc/emmalloc, i.e. memory is grouped into adjacent regions where each region is traversable back and forth, and free regions go into arrays of linked lists for quick lookup.
What is different compared to original emmalloc:
Comparisons:
dlmalloc:
Node.js: mean: 3.427 (+-0.045) secs median: 3.437 range: 3.362-3.487 (noise: 1.304%) (5 runs)
size: 37022, compressed: 3543
old emmalloc:
Node.js: mean: 4.414 (+-0.143) secs median: 4.348 range: 4.295-4.689 (noise: 3.237%) (5 runs)
size: 31394, compressed: 3560
new emmalloc:
Node.js: mean: 3.376 (+-0.032) secs median: 3.391 range: 3.338-3.419 (noise: 0.939%) (5 runs)
size: 31180, compressed: 3602
mt emmalloc:
Node.js: mean: 3.728 (+-0.049) secs median: 3.728 range: 3.642-3.792 (noise: 1.324%) (5 runs)
size: 62711, compressed: 9664
mt dlmalloc:
Node.js: mean: 3.994 (+-0.067) secs median: 3.966 range: 3.915-4.106 (noise: 1.665%) (5 runs)
size: 70338, compressed: 9602
The multithreaded benchmark.test_havlak still runs in a single thread, so the above stresses a noncontended case.
new emmalloc:
checksum: 316dd259
allocations: 4200847
mean alloc size: 31.51
max allocated: 699744
allocations #max 22209.92
sbrk chng: 969720
sbrk chng/allocs: 43.66
overhead: 12.16
sbrk top now: 0x5ede68
dlmalloc:
checksum: 32788815
allocations: 4200847
mean alloc size: 31.51
max allocated: 699744
allocations #max 22209.92
sbrk chng: 970720
sbrk chng/allocs: 43.71
overhead: 12.20
sbrk top now: 0x5ee000
and considerably less overhead than old emmalloc (12.16 vs 15.23):
old emmalloc:
checksum: 362013f1
allocations: 4200847
mean alloc size: 31.51
max allocated: 699744
allocations #max 22209.92
sbrk chng: 1037968
sbrk chng/allocs: 46.73
overhead: 15.23
sbrk top now: 0x5fe540
In summary, it is faster, smaller and has less overhead than dlmalloc and old emmalloc, when tested against benchmark.test_havlak and tests/malloc_bench.cpp.
There is https://github.com/daanx/mimalloc-bench that contains some benchmarks. Looks like most of them would require a bit of porting, but cfrac built quite easily, and
cfrac: dlmalloc is +17% faster than new emmalloc, but new emmalloc is +16.2% faster than old emmalloc.
In this PR src/settings.js is changed to default
MALLOC
setting from'dlmalloc'
to'emmalloc'
to have the bots run the whole suite through it, to verify that the new emmalloc is a feature complete replacement to dlmalloc. But to land should change back to default to dlmalloc instead.