Unnecessary 64-bit calculations result in 700+ bytes increase in firmware size #163

yakov-bakhmatov · 2023-09-04T12:53:15Z

Function uint32_t HAL_RCC_GetSysClockFreq(void) calculates expression PLL_VCO = (HSE_VALUE or HSI_VALUE / PLLM) * PLLN using 64-bit multiplication and division.

STM32CubeF4/Drivers/STM32F4xx_HAL_Driver/Src/stm32f4xx_hal_rcc.c

Lines 905 to 920 in d5af563

    
                 /* PLL_VCO = (HSE_VALUE or HSI_VALUE / PLLM) * PLLN 
        
                 SYSCLK = PLL_VCO / PLLP */ 
        
                 pllm = RCC->PLLCFGR & RCC_PLLCFGR_PLLM; 
        
                 if(__HAL_RCC_GET_PLL_OSCSOURCE() != RCC_PLLSOURCE_HSI) 
        
                 { 
        
                   /* HSE used as PLL clock source */ 
        
                   pllvco = (uint32_t) ((((uint64_t) HSE_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm); 
        
                 } 
        
                 else 
        
                 { 
        
                   /* HSI used as PLL clock source */ 
        
                   pllvco = (uint32_t) ((((uint64_t) HSI_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm); 
        
                 } 
        
                 pllp = ((((RCC->PLLCFGR & RCC_PLLCFGR_PLLP) >> RCC_PLLCFGR_PLLP_Pos) + 1U) *2U); 
        
                 sysclockfreq = pllvco/pllp;

This forces the compiler (in particular gcc) to link to an external __aeabi_uldivmod function that performs a 64-bit division.

But to calculate the expression a * b / c, where a, b and c are uint32_t and the result is also 32 bits, it is possible without expanding to 64 bits.

Let a = m * c + n, b = p * c + q. Then

a * b / c = (m * c + n) * (p * c + q) / c =
  (m * p * c * c + m * q * c + n * p * c + n * q) / c =
  m * p * c + m * q + n * p + n * q / c

Define the auxiliary function:

static uint32_t muldiv(uint32_t a, uint32_t b, uint32_t c) {
    uint32_t m = a / c;
    uint32_t n = a % c;
    uint32_t p = b / c;
    uint32_t q = b % c;
    return m * p * c + m * q + n * p + n * q / c;
}

Expressions in lines 911, 916 are converted to the following

-        pllvco = (uint32_t) ((((uint64_t) HSE_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);
+        pllvco = muldiv(HSE_VALUE, (RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos, pllm);

-        pllvco = (uint32_t) ((((uint64_t) HSI_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);
+        pllvco = muldiv(HSI_VALUE, (RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos, pllm);

How this change affects the size of the binary.

For example, create an empty Makefile project in CubeMX for MCU STM32F407 and compile it by arm-gnu-toolchain-12.2.

arm-none-eabi-size build/stm32f407-empty.elf

   text	   data	    bss	    dec	    hex	filename
   3668	     20	   1572	   5260	   148c	build/stm32f407-empty.elf

Using the muldiv function:

arm-none-eabi-size build/stm32f407-empty.elf

   text	   data	    bss	    dec	    hex	filename
   2892	     20	   1572	   4484	   1184	build/stm32f407-empty.elf

The difference in binary size is 776 bytes.

The text was updated successfully, but these errors were encountered:

TOUNSTM · 2024-06-21T16:45:34Z

See Also STMicroelectronics/stm32l0xx-hal-driver#12

bmcdonnell-fb · 2024-06-21T17:16:37Z

For example, create an empty Makefile project in CubeMX for MCU STM32F407 and compile it by arm-gnu-toolchain-12.2.
arm-none-eabi-size build/stm32f407-empty.elf

   text	   data	    bss	    dec	    hex	filename
   3668	     20	   1572	   5260	   148c	build/stm32f407-empty.elf
Using the muldiv function:
arm-none-eabi-size build/stm32f407-empty.elf

   text	   data	    bss	    dec	    hex	filename
   2892	     20	   1572	   4484	   1184	build/stm32f407-empty.elf
The difference in binary size is 776 bytes.

What if you do some uint64_t calculation somewhere else in your application?

bmcdonnell-fb · 2024-10-09T20:21:37Z

Code space aside, @yakov-bakhmatov's implementation is faster.

bmcdonnell-fb · 2024-10-09T20:32:25Z

BTW, IMO it's easier to see how it works w/ more descriptive variable names, e.g.

static uint32_t muldiv_u32(uint32_t mul1, uint32_t mul2, uint32_t denom)
{
    uint32_t quot1 = mul1 / denom;
    uint32_t rem1  = mul1 % denom;
    uint32_t quot2 = mul2 / denom;
    uint32_t rem2  = mul2 % denom;
    return ((quot1 * quot2 * denom) + (quot1 * rem2) + (rem1 * quot2) + (rem1 * rem2 / denom));
}

ALABSTM assigned TOUNSTM Sep 7, 2023

ALABSTM added enhancement New feature or request hal HAL-LL driver-related issue or pull-request. labels Sep 7, 2023

ALABSTM added the rcc RCC-related issue or pull-request. label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary 64-bit calculations result in 700+ bytes increase in firmware size #163

Unnecessary 64-bit calculations result in 700+ bytes increase in firmware size #163

yakov-bakhmatov commented Sep 4, 2023

TOUNSTM commented Jun 21, 2024

bmcdonnell-fb commented Jun 21, 2024

bmcdonnell-fb commented Oct 9, 2024

bmcdonnell-fb commented Oct 9, 2024

Unnecessary 64-bit calculations result in 700+ bytes increase in firmware size #163

Unnecessary 64-bit calculations result in 700+ bytes increase in firmware size #163

Comments

yakov-bakhmatov commented Sep 4, 2023

TOUNSTM commented Jun 21, 2024

bmcdonnell-fb commented Jun 21, 2024

bmcdonnell-fb commented Oct 9, 2024

bmcdonnell-fb commented Oct 9, 2024