my work on "sys_heap: a new/simpler/faster memory allocator" #19602

npitre · 2019-10-04T02:06:20Z

Not really for merging yet.

This is based on the initial work from @andyross found in PR #17628.

I'm using this PR as a placeholder for the current state of my work so
others could look at it and ... maybe get inspired.

Comments also welcome.

zephyrbot · 2019-10-04T03:05:52Z

All checks passed.

checkpatch (informational only, not a failure)

-:64: WARNING:NEW_TYPEDEFS: do not add new typedefs
#64: FILE: include/sys/heap-compat-common.h:22:
+typedef size_t chunkid_t;

-:162: WARNING:LONG_LINE: line over 80 characters
#162: FILE: include/sys/heap-compat-common.h:120:
+	(SYS_HEAP_CHUNK0_SIZE(1 + SYS_HEAP_NB_BUCKETS(SYS_HEAP_BUF_SIZE_1(alloc_bytes))) + \

-:664: WARNING:LINE_SPACING: Missing a blank line after declarations
#664: FILE: lib/os/heap-mempool-compat.c:79:
+	char *ret = sys_heap_alloc(p, size);
+	irq_unlock(key);

-:683: WARNING:LINE_SPACING: Missing a blank line after declarations
#683: FILE: lib/os/heap-mempool-compat.c:98:
+	unsigned int key = irq_lock();
+	sys_heap_free(p, ptr);

-:769: WARNING:LONG_LINE: line over 80 characters
#769: FILE: lib/os/heap-validate.c:78:
+		for (c = c0; c != 0 && (n == 0 || c != c0); n++, c = next_free_chunk(h, c)) {

-:795: WARNING:LONG_LINE: line over 80 characters
#795: FILE: lib/os/heap-validate.c:104:
+	for (c = right_chunk(h, 0); c <= max_chunkid(h); c = right_chunk(h, c)) {

-:825: WARNING:LONG_LINE: line over 80 characters
#825: FILE: lib/os/heap-validate.c:134:
+		for (c = c0; n == 0 || c != c0; n++, c = next_free_chunk(h, c)) {

-:836: WARNING:LONG_LINE: line over 80 characters
#836: FILE: lib/os/heap-validate.c:145:
+	for (c = right_chunk(h, 0); c <= max_chunkid(h); c = right_chunk(h, c)) {

-:1102: WARNING:LONG_LINE: line over 80 characters
#1102: FILE: lib/os/heap.c:113:
+		 "corrupted heap bounds (buffer overflow?) for memory at %p", mem);

-:1160: WARNING:LINE_SPACING: Missing a blank line after declarations
#1160: FILE: lib/os/heap.c:171:
+		int i = CONFIG_SYS_HEAP_ALLOC_LOOPS;
+		do {

-:1201: WARNING:LINE_SPACING: Missing a blank line after declarations
#1201: FILE: lib/os/heap.c:212:
+	struct z_heap *h = (struct z_heap *)addr;
+	heap->heap = h;

-:1293: WARNING:NEW_TYPEDEFS: do not add new typedefs
#1293: FILE: lib/os/heap.h:57:
+typedef size_t chunkid_t;

-:1308: WARNING:NEW_TYPEDEFS: do not add new typedefs
#1308: FILE: lib/os/heap.h:72:
+typedef struct { char bytes[CHUNK_UNIT]; } chunk_unit_t;

-:1453: WARNING:LINE_SPACING: Missing a blank line after declarations
#1453: FILE: lib/os/heap.h:217:
+	void *ret = (u8_t *)buf + chunk_header_bytes(h);
+	CHECK(!((uintptr_t)ret & (big_heap(h) ? 7 : 3)));

-:1482: WARNING:LINE_SPACING: Missing a blank line after declarations
#1482: FILE: lib/os/heap.h:246:
+	size_t usable_sz = sz - min_chunk_size(h) + 1;
+	return 31 - __builtin_clz(usable_sz);

-:1775: WARNING:LONG_LINE: line over 80 characters
#1775: FILE: tests/lib/heap/src/main.c:162:
+	TC_PRINT("Testing maximally fragmented (%d byte) heap\n", SMALL_HEAP_SZ);

- total: 0 errors, 16 warnings, 1686 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

pdunaj · 2019-10-04T09:14:44Z

lib/os/heap-mempool-compat.c

+		*block_p = heap_chunkid_to_blockid_block(c);
+		return 0;
+	}
+	return -ENOMEM;


As there is no retries now I think it would be good to get rid of the for loop in k_mem_pool_alloc (the one that does retries on z_sys_mem_pool_block_alloc) and also fix the comment.

npitre · 2019-10-04T16:33:33Z

On Fri, 4 Oct 2019, Pawel Dunaj wrote: As there is no retries now I think it would be good to get rid of the for loop in `k_mem_pool_alloc` (the one that does retries on `z_sys_mem_pool_block_alloc`) and also fix the comment.

Good point. See PR #19622.

andyross

On the whole I love almost all of this. It probably needs some tuning of the tests before merge to make sure we're passing as many as possible. (Edit: sorry, should read "disabling as few as possible")

And if we decide that we're sure we want to make this the core API going forward, we should probably mark the mempool stuff deprecated.

andyross · 2019-10-08T15:34:30Z

tests/kernel/mem_pool/mem_pool_concept/testcase.yaml

@@ -1,3 +1,4 @@
 tests:
  kernel.memory_pool:
    tags: kernel mem_pool
+    filter: not CONFIG_SYS_HEAP_MEMPOOL_COMPAT


Surely there's some test case in this that can pass with the new code. Probably best to identify the specific set of test cases that fail instead of disabling whole tests.

andyross · 2019-10-08T15:35:20Z

tests/kernel/mem_pool/sys_mem_pool/src/main.c

@@ -95,6 +95,7 @@ void test_sys_mem_pool_alloc_align4(void)
 */
 void test_sys_mem_pool_min_block_size(void)
 {
+#ifndef CONFIG_SYS_HEAP_MEMPOOL_COMPAT


There is a ztest_test_skip() utility that would probably be better applied here than block preprocessor conditionals. Also we should include a comment explaining why this particular test case isn't applicable, e.g.:

if (IS_ENABLED(CONFIG_SYS_HEAP_MEMPOOL_COMPAT)) { /* Doesn't work with compat API because of {alignment/whatever...} */ ztest_test_skip(); return; }

andrewboie · 2019-10-08T17:10:53Z

Please be advised that we need to show full code coverage for the C and header files related to this mechanism, by running "sanitycheck -p qemu_x86 --coverage -T tests/lib/heap".

I haven't run this to see what the numbers currently are, just making sure everyone is aware.

The existing mem_pool implementation has been an endless source of frustration. It's had alignment bugs, it's had racy behavior. It's never been particularly fast. It's outrageously complicated to configure statically. And while its fragmentation resistance and overhead on small blocks is good, it's space efficiencey has always been very poor due to the four-way buddy scheme. This patch introduces sys_heap. It's a more or less conventional segregated fit allocator with power-of-two buckets. It doesn't expose its level structure to the user at all, simply taking an arbitrarily aligned pointer to memory. It stores all metadata inside the heap region. It allocates and frees by simple pointer and not block ID. Static initialization is trivial, and runtime initialization is only a few cycles to format and add one block to a list header. It has excellent space efficiency. Chunks can be split arbitrarily in 8 byte units. Overhead is only four bytes per allocated chunk (eight bytes for heaps >256kb or on 64 bit systems), plus a log2-sized array of 2-word bucket headers. No coarse alignment restrictions on blocks, they can be split and merged (in units of 8 bytes) arbitrarily. It has good fragmentation resistance. Freed blocks are always immediately merged with adjacent free blocks. Allocations are attempted from a sample of the smallest bucket that might fit, falling back rapidly to the smallest block guaranteed to fit. Split memory remaining in the chunk is always returned immediately to the heap for other allocation. It has excellent performance with firmly bounded runtime. All operations are constant time (though there is a search of the smallest bucket that has a compile-time-configurable upper bound, setting this to extreme values results in an effectively linear search of the list), objectively fast (about a hundred instructions) and amenable to locked operation. No more need for fragile lock relaxation trickery. It also contains an extensive validation and stress test framework, something that was sorely lacking in the previous implementation. Note that sys_heap is not a compatible API with sys_mem_pool and k_mem_pool. Wrappers for those (now-) legacy APIs appear later in this patch series. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>

Use the white box validation and test rig added as part of the sys_heap work. Add a layer that puts hashed cookies into the blocks to detect corruption, check the validity state after every operation, and enumerate a few different usage patterns: + Small heap, "real world" allocation where the heap is about half full and most allocations succeed. + Small heap, "fragmentation runaway" scenario where most allocations start failing, but the heap must remain consistent. + Big heap. We can't test this with the same exhaustive coverage (many re/allocations for every byte of storage) for performance reasons, but we do what we can. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>

The DT_SRAM_SIZE and CONFIG_SRAM_SIZE symbols are expressed in KB and not in bytes. Without this consideration, BIG_HEAP_SZ may end up being equal to 32, making scratchmem[] to appear only 32 bytes after the beginning of heapmem[] in memory while test_small_heap() uses the later assuming it is at least 2048 bytes. Fun stuff ensues. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

The em_starterkit_em7d_normal platform defines neither DT_SRAM_SIZE nor CONFIG_SRAM_SIZE while its linker script indicates only 64KB is available. And platforms with 16KB or less simply have too little RAM to accommodate this test. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

First, some renames to make accessors more explicit: size() --> chunk_size() used() --> chunk_used() free_prev() --> prev_free_chunk() free_next() --> next_free_chunk() Then, the return type of chunk_size() is changed from chunkid_t to size_t, and chunk_used() from chunkid_t to bool. The left_size() accessor is used only once and can be easily substituted by left_chunk(), so it is removed. And in free_list_add() the variable b is renamed to bi so to be consistent with usage in sys_heap_alloc(). Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

Let's provide accessors for getting and setting every field to make the chunk header layout abstracted away from the main code. Those are: SIZE_AND_USED: chunk_used(), chunk_size(), set_chunk_used() and chunk_size(). LEFT_SIZE: left_chunk() and set_left_chunk_size(). FREE_PREV: prev_free_chunk() and set_prev_free_chunk(). FREE_NEXT: next_free_chunk() and set_next_free_chunk(). To be consistent, the former chunk_set_used() is now set_chunk_used(). Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

By storing the used flag in the LSB, it is no longer necessary to have a size_mask variable to locate that flag. This produces smaller and faster code. Replace the validation check in chunk_set() to base it on the storage type. Also clarify the semantics of set_chunk_size() which allows for clearing the used flag bit unconditionally which simplifies the code further. The idea of moving the used flag bit into the LEFT_SIZE field was raised. It turns out that this isn't as beneficial as it may seem because the used bit is set only once i.e. when the memory is handed off to a user and the size field becomes frozen at that point. Modifications on the leftward chunk may still occur and extra instructions to preserve that bit would be necessary if it were moved there. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

It is possible to remove a few fields from struct z_heap, removing some runtime indirections by doing so: - The buf pointer is actually the same as the struct z_heap pointer itself. So let's simply create chunk_buf() that perform a type conversion. That type is also chunk_unit_t now rather than u64_t so it can be defined based on CHUNK_UNIT. - Replace the struct z_heap_bucket pointer by a zero-sized array at the end of struct z_heap. - Make chunk #0 into an actual chunk with its own header. This allows for removing the chunk0 field and streamlining the code. This way h->chunk0 becomes right_chunk(h, 0). This sets the table for further simplifications to come. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

With this we can remove magic constants, especially those used with big_heap(). Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

We already have chunk #0 containing our struct z_heap and marked as used. We can add a partial chunk at the very end that is also marked as used. By doing so there is no longer a need for checking heap boundaries at run time when merging/splitting chunks meaning fewer conditionals in the code's hot path. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

Avoid redundancy and bucket_idx() usage when possible. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

Make the LEFT_SIZE field first and SIZE_AND_USED field last (for an allocated chunk) so they sit right next to the allocated memory. The current chunk's SIZE_AND_USED field points to the next (right) chunk, and from there the LEFT_SIZE field should point back to the current chunk. Many trivial memory overflows should trip that test. One way to make this test more robust could involve xor'ing the values within respective accessor pairs. But at least the fact that the size value is shifted by one bit already prevent fooling the test with a same-byte corruption. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

This struct is taking up most of the heap's constant footprint overhead. We can easily get rid of the list_size member as it is mostly used to determine if the list is empty, and that can be determined through other means. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

This to allow sharing with the compat code. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

This is a drop-in replacement for the mempool interface. Obviously this is not meant to be merged as is, if ever. This is nice for performance and behavior comparison tests using actual application code. However there would be benefits to migrating users to the native sys_heap API eventually. Some caveats: - Block sizing and alignment is completely ignored. Only the number of max block size is guaranteed. - Yet the static allocation appears to be biggish. - Every allocation request incurs a space overhead. The original mempool with power-of-2 sizing does not have this overhead besides its out-of-line free block bitmap. Things move in favor of the sys_heap allocator only when using sizes that don't divide by 4. - Tests that expect some allocation failure patterns are likely to fail (i.e. successfully allocate) with this. The reverse is also true. Hence a couple tests are disabled. - The compat kconfig option defaults to y for ease of testing. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

npitre · 2020-04-09T20:05:45Z

Superseded by #24249

npitre requested review from andrewboie, andyross, carlescufi, galak, MaureenHelm and nashif as code owners October 4, 2019 02:06

zephyrbot added area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test area: Kernel labels Oct 4, 2019

carlescufi requested review from nordic-krch, pizi-nordic, pdunaj and pabigot October 4, 2019 08:50

pdunaj reviewed Oct 4, 2019

View reviewed changes

andyross reviewed Oct 8, 2019

View reviewed changes

Andy Ross and others added 12 commits October 25, 2019 15:55

sys_heap: introduce min_chunk_size()

ff6d8e7

With this we can remove magic constants, especially those used with big_heap(). Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

sys_heap: simplify some complex checks

0daece0

Avoid redundancy and bucket_idx() usage when possible. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

Nicolas Pitre added 3 commits October 25, 2019 16:15

sys_heap: move chunk-to-memory conversion helpers to heap.h

fab3b07

This to allow sharing with the compat code. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

npitre force-pushed the sys-heap branch from f72b2c8 to e07650f Compare October 25, 2019 20:17

npitre closed this Apr 9, 2020

npitre deleted the sys-heap branch February 11, 2021 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

my work on "sys_heap: a new/simpler/faster memory allocator" #19602

my work on "sys_heap: a new/simpler/faster memory allocator" #19602

Uh oh!

npitre commented Oct 4, 2019

Uh oh!

zephyrbot commented Oct 4, 2019 •

edited

Loading

Uh oh!

pdunaj Oct 4, 2019

Uh oh!

npitre commented Oct 4, 2019 via email

Uh oh!

andyross left a comment •

edited

Loading

Uh oh!

andyross Oct 8, 2019

Uh oh!

andyross Oct 8, 2019

Uh oh!

andrewboie commented Oct 8, 2019

Uh oh!

npitre commented Apr 9, 2020

Uh oh!

Uh oh!

my work on "sys_heap: a new/simpler/faster memory allocator" #19602

my work on "sys_heap: a new/simpler/faster memory allocator" #19602

Uh oh!

Conversation

npitre commented Oct 4, 2019

Uh oh!

zephyrbot commented Oct 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

checkpatch (informational only, not a failure)

Uh oh!

pdunaj Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

npitre commented Oct 4, 2019 via email

Uh oh!

andyross left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andyross Oct 8, 2019

Choose a reason for hiding this comment

Uh oh!

andyross Oct 8, 2019

Choose a reason for hiding this comment

Uh oh!

andrewboie commented Oct 8, 2019

Uh oh!

npitre commented Apr 9, 2020

Uh oh!

Uh oh!

zephyrbot commented Oct 4, 2019 •

edited

Loading

andyross left a comment •

edited

Loading