Skip to content

Conversation

agansari
Copy link
Owner

@agansari agansari commented Apr 2, 2019

No description provided.

@agansari agansari merged this pull request into agansari:lpc55xx Apr 2, 2019
agansari pushed a commit that referenced this pull request Jul 8, 2019
Currently, the free block bitmap is roughly 4 times larger than it
needs to, wasting memory.

Let's assume maxsz = 128, minsz = 8 and n_max = 40.

Z_MPOOL_LVLS(128, 8) returns 3. The block size for level #0 is 128,
the block size for level #1 is 128/4 = 32, and the block size for
level #2 is 32/4 = 8. Hence levels 0, 1, and 2 for a total of 3 levels.
So far so good.

Now let's look at Z_MPOOL_LBIT_WORDS(). We get:

Z_MPOOL_LBIT_WORDS_UNCLAMPED(40, 0) = ((40 << 0) + 31) / 32 = 2
Z_MPOOL_LBIT_WORDS_UNCLAMPED(40, 1) = ((40 << 2) + 31) / 32 = 5
Z_MPOOL_LBIT_WORDS_UNCLAMPED(40, 2) = ((40 << 4) + 31) / 32 = 20

None of those are < 2 so Z_MPOOL_LBIT_WORDS() takes the results from
Z_MPOOL_LBIT_WORDS_UNCLAMPED().

Finally, let's look at _MPOOL_BITS_SIZE(. It sums all possible levels
with Z_MPOOL_LBIT_BYTES() which is:

  #define Z_MPOOL_LBIT_BYTES(maxsz, minsz, l, n_max)    \
        (Z_MPOOL_LVLS((maxsz), (minsz)) >= (l) ?        \
         4 * Z_MPOOL_LBIT_WORDS((n_max), l) : 0)

Or given what we already have:

Z_MPOOL_LBIT_BYTES(128, 8, 0, 40) = (3 >= 0) ? 4 * 2  : 0 = 8
Z_MPOOL_LBIT_BYTES(128, 8, 1, 40) = (3 >= 1) ? 4 * 5  : 0 = 20
Z_MPOOL_LBIT_BYTES(128, 8, 2, 40) = (3 >= 2) ? 4 * 20 : 0 = 80
Z_MPOOL_LBIT_BYTES(128, 8, 3, 40) = (3 >= 3) ? 4 * ??

Wait... we're missing this one:

Z_MPOOL_LBIT_WORDS_UNCLAMPED(40, 3) = ((40 << 6) + 31) / 32 = 80

then:

Z_MPOOL_LBIT_BYTES(128, 8, 3, 40) = (3 >= 3) ? 4 * 80 : 0 = 320

Further levels yeld (3 >= 4), (3 >= 5), etc. so they're all false and
produce 0.

So this means that we're statically allocating 428 bytes to the bitmap
when clearly only the first 3 Z_MPOOL_LBIT_BYTES() results for the
corresponding 3 levels that we have should be summed e.g. only
108 bytes.

Here the code logic gets confused between level numbers and the number
levels, hence the extra allocation which happens to be exponential.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
agansari pushed a commit that referenced this pull request Apr 27, 2020
Fix two issues:

1. The script assumes the default CMake generator build tool
   platform is installed. On Linux at least, that's Make instead
   of Ninja, but Make might not be installed since Zephyr recommends
   Ninja. On Windows, that might be VS Code or nmake.

   Calling `cmake -P pristine` instead of `cmake --build <path>
   --target pristine` has the benefit of removing the dependency on a
   build command, and hence the default generator is not relevant.

2. It also assumes run_cmake() returns control, and therefore pristine
   can be run.

   However, if the cmake command fails hard (say, due to issue #1
   before this patch), run_cmake() throws an exception instead.

   Fix that by trying to run the pristine target in a finally block
   instead, and adding some manual cleanup steps in case the build
   system is in a bad state and pristine fails too.

Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
Signed-off-by: Torsten Rasmussen <torsten.rasmussen@nordicsemi.no>
agansari pushed a commit that referenced this pull request Apr 27, 2020
Implement deep sleep mode #1 using the shutdown state on the
CC13x2/CC26x2.

Signed-off-by: Vincent Wan <vincent.wan@linaro.org>
agansari pushed a commit that referenced this pull request Jun 22, 2020
This makes the gatt metrics also available for
gatt write-without-rsp-cb so it now prints the rate of each write:

uart:~$ gatt write-without-response-cb 1e ff 10 10
Write #1: 16 bytes (0 bps)
Write #2: 32 bytes (3445948416 bps)
Write zephyrproject-rtos#3: 48 bytes (2596929536 bps)
Write zephyrproject-rtos#4: 64 bytes (6400 bps)
Write zephyrproject-rtos#5: 80 bytes (8533 bps)
Write zephyrproject-rtos#6: 96 bytes (10666 bps)
Write zephyrproject-rtos#7: 112 bytes (8533 bps)
Write zephyrproject-rtos#8: 128 bytes (9955 bps)
Write zephyrproject-rtos#9: 144 bytes (11377 bps)
Write zephyrproject-rtos#10: 160 bytes (7680 bps)
Write zephyrproject-rtos#11: 176 bytes (8533 bps)
Write zephyrproject-rtos#12: 192 bytes (9386 bps)
Write Complete (err 0)
Write zephyrproject-rtos#13: 208 bytes (8533 bps)
Write zephyrproject-rtos#14: 224 bytes (9244 bps)
Write zephyrproject-rtos#15: 240 bytes (9955 bps)
Write zephyrproject-rtos#16: 256 bytes (8000 bps)

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
agansari pushed a commit that referenced this pull request Nov 13, 2020
The _ldiv5() is an optimized divide-by-5 function that is smaller and
faster than the generic libgcc implementation.

Yet it can be made even smaller and faster with this replacement
implementation based on a reciprocal multiplication plus some tricks.

For example, here's the assembly from the original code on ARM:

_ldiv5:
        ldr     r3, [r0]
        movw    ip, zephyrproject-rtos#52429
        ldr     r1, [r0, zephyrproject-rtos#4]
        movt    ip, 52428
        adds    r3, r3, #2
        push    {r4, r5, r6, r7, lr}
        mov     lr, #0
        adc     r1, r1, lr
        adds    r2, lr, lr
        umull   r7, r6, ip, r1
        lsr     r6, r6, #2
        adc     r7, r6, r6
        adds    r2, r2, r2
        adc     r7, r7, r7
        adds    r2, r2, lr
        adc     r7, r7, r6
        subs    r3, r3, r2
        sbc     r7, r1, r7
        lsr     r2, r3, zephyrproject-rtos#3
        orr     r2, r2, r7, lsl zephyrproject-rtos#29
        umull   r2, r1, ip, r2
        lsr     r2, r1, #2
        lsr     r7, r1, zephyrproject-rtos#31
        lsl     r1, r2, zephyrproject-rtos#3
        adds    r4, lr, r1
        adc     r5, r6, r7
        adds    r2, r1, r1
        adds    r2, r2, r2
        adds    r2, r2, r1
        subs    r2, r3, r2
        umull   r3, r2, ip, r2
        lsr     r2, r2, #2
        adds    r4, r4, r2
        adc     r5, r5, #0
        strd    r4, [r0]
        pop     {r4, r5, r6, r7, pc}

And here's the resulting assembly with this commit applied:

_ldiv5:
        push    {r4, r5, r6, r7}
        movw    r4, zephyrproject-rtos#13107
        ldr     r6, [r0]
        movt    r4, 13107
        ldr     r1, [r0, zephyrproject-rtos#4]
        mov     r3, #0
        umull   r6, r7, r6, r4
        add     r2, r4, r4, lsl #1
        umull   r4, r5, r1, r4
        adds    r1, r6, r2
        adc     r2, r7, r2
        adds    ip, r6, r4
        adc     r1, r7, r5
        adds    r2, ip, r2
        adc     r2, r1, r3
        adds    r2, r4, r2
        adc     r3, r5, r3
        strd    r2, [r0]
        pop     {r4, r5, r6, r7}
        bx      lr

So we're down to 20 instructions from 36 initially, with only 2 umull
instructions instead of 3, and slightly smaller stack footprint.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant