Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary 64-bit calculations result in 700+ bytes increase in firmware size #163

Open
yakov-bakhmatov opened this issue Sep 4, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request hal HAL-LL driver-related issue or pull-request. rcc RCC-related issue or pull-request.

Comments

@yakov-bakhmatov
Copy link

Function uint32_t HAL_RCC_GetSysClockFreq(void) calculates expression PLL_VCO = (HSE_VALUE or HSI_VALUE / PLLM) * PLLN using 64-bit multiplication and division.

/* PLL_VCO = (HSE_VALUE or HSI_VALUE / PLLM) * PLLN
SYSCLK = PLL_VCO / PLLP */
pllm = RCC->PLLCFGR & RCC_PLLCFGR_PLLM;
if(__HAL_RCC_GET_PLL_OSCSOURCE() != RCC_PLLSOURCE_HSI)
{
/* HSE used as PLL clock source */
pllvco = (uint32_t) ((((uint64_t) HSE_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);
}
else
{
/* HSI used as PLL clock source */
pllvco = (uint32_t) ((((uint64_t) HSI_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);
}
pllp = ((((RCC->PLLCFGR & RCC_PLLCFGR_PLLP) >> RCC_PLLCFGR_PLLP_Pos) + 1U) *2U);
sysclockfreq = pllvco/pllp;

This forces the compiler (in particular gcc) to link to an external __aeabi_uldivmod function that performs a 64-bit division.

But to calculate the expression a * b / c, where a, b and c are uint32_t and the result is also 32 bits, it is possible without expanding to 64 bits.

Let a = m * c + n, b = p * c + q. Then

a * b / c = (m * c + n) * (p * c + q) / c =
  (m * p * c * c + m * q * c + n * p * c + n * q) / c =
  m * p * c + m * q + n * p + n * q / c

Define the auxiliary function:

static uint32_t muldiv(uint32_t a, uint32_t b, uint32_t c) {
    uint32_t m = a / c;
    uint32_t n = a % c;
    uint32_t p = b / c;
    uint32_t q = b % c;
    return m * p * c + m * q + n * p + n * q / c;
}

Expressions in lines 911, 916 are converted to the following

-        pllvco = (uint32_t) ((((uint64_t) HSE_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);
+        pllvco = muldiv(HSE_VALUE, (RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos, pllm);
-        pllvco = (uint32_t) ((((uint64_t) HSI_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);
+        pllvco = muldiv(HSI_VALUE, (RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos, pllm);

How this change affects the size of the binary.

For example, create an empty Makefile project in CubeMX for MCU STM32F407 and compile it by arm-gnu-toolchain-12.2.

arm-none-eabi-size build/stm32f407-empty.elf

   text	   data	    bss	    dec	    hex	filename
   3668	     20	   1572	   5260	   148c	build/stm32f407-empty.elf

Using the muldiv function:

arm-none-eabi-size build/stm32f407-empty.elf

   text	   data	    bss	    dec	    hex	filename
   2892	     20	   1572	   4484	   1184	build/stm32f407-empty.elf

The difference in binary size is 776 bytes.

@ALABSTM ALABSTM added enhancement New feature or request hal HAL-LL driver-related issue or pull-request. labels Sep 7, 2023
@ALABSTM ALABSTM added the rcc RCC-related issue or pull-request. label May 27, 2024
@TOUNSTM
Copy link
Contributor

TOUNSTM commented Jun 21, 2024

@bmcdonnell-fb
Copy link

For example, create an empty Makefile project in CubeMX for MCU STM32F407 and compile it by arm-gnu-toolchain-12.2.

arm-none-eabi-size build/stm32f407-empty.elf

   text	   data	    bss	    dec	    hex	filename
   3668	     20	   1572	   5260	   148c	build/stm32f407-empty.elf

Using the muldiv function:

arm-none-eabi-size build/stm32f407-empty.elf

   text	   data	    bss	    dec	    hex	filename
   2892	     20	   1572	   4484	   1184	build/stm32f407-empty.elf

The difference in binary size is 776 bytes.

What if you do some uint64_t calculation somewhere else in your application?

@bmcdonnell-fb
Copy link

Code space aside, @yakov-bakhmatov's implementation is faster.

@bmcdonnell-fb
Copy link

BTW, IMO it's easier to see how it works w/ more descriptive variable names, e.g.

static uint32_t muldiv_u32(uint32_t mul1, uint32_t mul2, uint32_t denom)
{
    uint32_t quot1 = mul1 / denom;
    uint32_t rem1  = mul1 % denom;
    uint32_t quot2 = mul2 / denom;
    uint32_t rem2  = mul2 % denom;
    return ((quot1 * quot2 * denom) + (quot1 * rem2) + (rem1 * quot2) + (rem1 * rem2 / denom));
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hal HAL-LL driver-related issue or pull-request. rcc RCC-related issue or pull-request.
Projects
Development

No branches or pull requests

4 participants