mp_mul_2d Produces Out-of-Range *tmpc Values on ARMCC

### Version

4.1.0

### Description

Summary
When compiling wolfSSL with ARM Compiler 5 (armcc) for Cortex-M4, the mp_mul_2d function produces out-of-range *tmpc values (exceeding MP_MASK = 0xFFFFFFF) in the before shift stage, leading to incorrect mp_int->used values (e.g., key->n.used reaching 396 instead of ~73 for 256-byte input). The same code works correctly with GCC, where *tmpc remains within 28 bits.
Environment

wolfSSL Version: [Specify version, e.g., 5.7.0]
Compiler: ARM Compiler 5 (armcc) vs. GCC (arm-none-eabi-gcc)
Platform: Cortex-M4 (FPv4-SP, interwork)
Configuration:
MP_MASK = 0xFFFFFFF
DIGIT_BIT = 28
sizeof(mp_digit) = 4 (32-bit unsigned)
Compiler flags: --cpu=Cortex-M4.fp --fpu=FPv4-SP --apcs=/interwork -O0 --c99 --no_inline --strict --no_autoinline --no_unaligned_access --force_new_delete --bss_threshold=0
Defines: WOLFSSL_USER_SETTINGS, WOLFSSL_NEXGO_MC661, NO_STRICT_ALIASING, NO_WARN_ASSIGN_IN_CONDITION



Steps to Reproduce

Compile wolfSSL with armcc using the provided build script (see below).
Run wc_RsaPublicKeyDecodeRaw with a 256-byte RSA modulus and 3-byte exponent.
Log *tmpc values in mp_mul_2d (via xgd_printf).
Compare with GCC-compiled output.

Build Script (relevant portion):
OPT_FLAGS = -O0
CFLAGS = --cpu=Cortex-M4.fp --fpu=FPv4-SP --apcs=/interwork -O0 --diag_suppress=1,1296,188,111,68,177,223,1293 --c99 --split_sections -DWOLFSSL_USER_SETTINGS -DWOLFSSL_NEXGO_MC661 -DNO_STRICT_ALIASING -DNO_WARN_ASSIGN_IN_CONDITION --no_inline --strict --no_autoinline --no_unaligned_access --force_new_delete --bss_threshold=0 --gnu


Expected Behavior

*tmpc values in mp_mul_2d (before and after shift) should be within 28 bits (<= 0xFFFFFFF).
key->n.used should be ~73 for a 256-byte modulus (256 * 8 / 28 ≈ 73).
Behavior should match GCC, where *tmpc is always within 28 bits.

Actual Behavior

armcc:
before shift in mp_mul_2d: *tmpc exceeds 28 bits (e.g., 0xEFCDAB89 at line 622).
after shift: *tmpc is correctly masked to 28 bits (e.g., 0x0DAB8900).
key->n.used grows abnormally (e.g., 396).


gcc: All *tmpc values stay within 28 bits, and key->n.used is ~73.

Log Excerpt (armcc):
行 622: liuhaowei, mp_mul_2d: before shift, *tmpc = 0xEFCDAB89
行 623: liuhaowei, mp_mul_2d: rr = 0x000000FC
行 624: liuhaowei, mp_mul_2d: after shift, *tmpc = 0x0DAB8900
行 625: liuhaowei, mp_mul_2d: before shift, *tmpc = 0x98BADCFE

Log Excerpt (gcc):
51: liuhaowei, mp_mul_2d: before shift, *tmpc = 0x0D0EA530
52: liuhaowei, mp_mul_2d: rr = 0x000000D0
53: liuhaowei, mp_mul_2d: after shift, *tmpc = 0x0EA53000

Root Cause Analysis

mp_mul_2d assigns *tmpc = c->dp[x] before shift. In armcc, c->dp contains out-of-range values (e.g., 0xEFCDAB89), likely due to uninitialized memory in mp_init or mp_grow.
after shift, *tmpc is correctly masked by (*tmpc << d) | r) & MP_MASK, indicating the masking operation works.
Possible causes:
Uninitialized dp array: armcc may not zero-initialize c->dp in mp_init or mp_grow, leading to garbage data in high bits (29-32).
Incorrect used management: mp_read_unsigned_bin may increment c->used excessively, causing mp_mul_2d to process uninitialized dp elements.
Compiler behavior: armcc may handle mp_digit (32-bit unsigned) differently, possibly promoting to a larger type or failing to clear high bits.



Attempted Fixes

Disabled optimizations: Set -O0, --no_inline, --strict, --no_autoinline, --no_unaligned_access, --force_new_delete, --bss_threshold=0. No change in behavior.
Preprocessed mp_int: Added strict_init_mp_int to zero dp and used before mp_read_unsigned_bin. Issue persists due to internal mp_grow or mp_read_unsigned_bin behavior.
Verified masking: Confirmed MP_MASK is applied correctly in after shift, but before shift values are already corrupted.

Questions for wolfSSL Team

Is there a known issue with mp_init or mp_grow failing to zero-initialize dp on armcc?
Could armcc's handling of mp_digit (e.g., type promotion or memory alignment) cause high-bit pollution in c->dp?
Are there specific armcc compiler flags or patches to ensure mp_digit values stay within 28 bits?
Can you recommend a workaround to enforce dp initialization or restrict used growth in mp_read_unsigned_bin without modifying the library?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mp_mul_2d Produces Out-of-Range *tmpc Values on ARMCC #8968

Version

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mp_mul_2d Produces Out-of-Range *tmpc Values on ARMCC #8968

Description

Version

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions