Skip to content

mp_mul_2d Produces Out-of-Range *tmpc Values on ARMCC #8968

@liuhaowei-littlebird

Description

@liuhaowei-littlebird

Version

4.1.0

Description

Summary
When compiling wolfSSL with ARM Compiler 5 (armcc) for Cortex-M4, the mp_mul_2d function produces out-of-range *tmpc values (exceeding MP_MASK = 0xFFFFFFF) in the before shift stage, leading to incorrect mp_int->used values (e.g., key->n.used reaching 396 instead of ~73 for 256-byte input). The same code works correctly with GCC, where *tmpc remains within 28 bits.
Environment

wolfSSL Version: [Specify version, e.g., 5.7.0]
Compiler: ARM Compiler 5 (armcc) vs. GCC (arm-none-eabi-gcc)
Platform: Cortex-M4 (FPv4-SP, interwork)
Configuration:
MP_MASK = 0xFFFFFFF
DIGIT_BIT = 28
sizeof(mp_digit) = 4 (32-bit unsigned)
Compiler flags: --cpu=Cortex-M4.fp --fpu=FPv4-SP --apcs=/interwork -O0 --c99 --no_inline --strict --no_autoinline --no_unaligned_access --force_new_delete --bss_threshold=0
Defines: WOLFSSL_USER_SETTINGS, WOLFSSL_NEXGO_MC661, NO_STRICT_ALIASING, NO_WARN_ASSIGN_IN_CONDITION

Steps to Reproduce

Compile wolfSSL with armcc using the provided build script (see below).
Run wc_RsaPublicKeyDecodeRaw with a 256-byte RSA modulus and 3-byte exponent.
Log *tmpc values in mp_mul_2d (via xgd_printf).
Compare with GCC-compiled output.

Build Script (relevant portion):
OPT_FLAGS = -O0
CFLAGS = --cpu=Cortex-M4.fp --fpu=FPv4-SP --apcs=/interwork -O0 --diag_suppress=1,1296,188,111,68,177,223,1293 --c99 --split_sections -DWOLFSSL_USER_SETTINGS -DWOLFSSL_NEXGO_MC661 -DNO_STRICT_ALIASING -DNO_WARN_ASSIGN_IN_CONDITION --no_inline --strict --no_autoinline --no_unaligned_access --force_new_delete --bss_threshold=0 --gnu

Expected Behavior

*tmpc values in mp_mul_2d (before and after shift) should be within 28 bits (<= 0xFFFFFFF).
key->n.used should be ~73 for a 256-byte modulus (256 * 8 / 28 ≈ 73).
Behavior should match GCC, where *tmpc is always within 28 bits.

Actual Behavior

armcc:
before shift in mp_mul_2d: *tmpc exceeds 28 bits (e.g., 0xEFCDAB89 at line 622).
after shift: *tmpc is correctly masked to 28 bits (e.g., 0x0DAB8900).
key->n.used grows abnormally (e.g., 396).

gcc: All *tmpc values stay within 28 bits, and key->n.used is ~73.

Log Excerpt (armcc):
行 622: liuhaowei, mp_mul_2d: before shift, *tmpc = 0xEFCDAB89
行 623: liuhaowei, mp_mul_2d: rr = 0x000000FC
行 624: liuhaowei, mp_mul_2d: after shift, *tmpc = 0x0DAB8900
行 625: liuhaowei, mp_mul_2d: before shift, *tmpc = 0x98BADCFE

Log Excerpt (gcc):
51: liuhaowei, mp_mul_2d: before shift, *tmpc = 0x0D0EA530
52: liuhaowei, mp_mul_2d: rr = 0x000000D0
53: liuhaowei, mp_mul_2d: after shift, *tmpc = 0x0EA53000

Root Cause Analysis

mp_mul_2d assigns *tmpc = c->dp[x] before shift. In armcc, c->dp contains out-of-range values (e.g., 0xEFCDAB89), likely due to uninitialized memory in mp_init or mp_grow.
after shift, *tmpc is correctly masked by (*tmpc << d) | r) & MP_MASK, indicating the masking operation works.
Possible causes:
Uninitialized dp array: armcc may not zero-initialize c->dp in mp_init or mp_grow, leading to garbage data in high bits (29-32).
Incorrect used management: mp_read_unsigned_bin may increment c->used excessively, causing mp_mul_2d to process uninitialized dp elements.
Compiler behavior: armcc may handle mp_digit (32-bit unsigned) differently, possibly promoting to a larger type or failing to clear high bits.

Attempted Fixes

Disabled optimizations: Set -O0, --no_inline, --strict, --no_autoinline, --no_unaligned_access, --force_new_delete, --bss_threshold=0. No change in behavior.
Preprocessed mp_int: Added strict_init_mp_int to zero dp and used before mp_read_unsigned_bin. Issue persists due to internal mp_grow or mp_read_unsigned_bin behavior.
Verified masking: Confirmed MP_MASK is applied correctly in after shift, but before shift values are already corrupted.

Questions for wolfSSL Team

Is there a known issue with mp_init or mp_grow failing to zero-initialize dp on armcc?
Could armcc's handling of mp_digit (e.g., type promotion or memory alignment) cause high-bit pollution in c->dp?
Are there specific armcc compiler flags or patches to ensure mp_digit values stay within 28 bits?
Can you recommend a workaround to enforce dp initialization or restrict used growth in mp_read_unsigned_bin without modifying the library?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions