Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restrict MIPS32 MULADDC_* to MIPS I, II. #275

Open
RonEld opened this issue Sep 22, 2019 · 1 comment
Open

restrict MIPS32 MULADDC_* to MIPS I, II. #275

RonEld opened this issue Sep 22, 2019 · 1 comment

Comments

@RonEld
Copy link
Contributor

RonEld commented Sep 22, 2019

Description

Reported by Email:
"TL;DR: MIPS32-specific mbed-crypto/include/mbedtls/bn_mul.h:MULADDC_* code path is slower than the pure C fallback and should be disabled.

All the tables below compare "RSA-2048 : NNN public/s" values as displayed by "programs/test/benchmark rsa".

1200MHz P5600 (gcc: -mips32r2 -mtune=24kf/74kf/p5600, clang: -mcpu=mips32r5/p5600):
clang-8 -O2: 887 (747)
clang-8 -Os: 898 (748)
clang-8 -Oz: 876 (735)
gcc-9.2 -O2: 898,803,800 (750,752,752)
gcc-9.2 -Os: 856,859,865 (730,729,744)
gcc-7.3 -O2: 904,806,801 (772,770,781)
gcc-7.3 -Os: 855,875,884 (775,774,774)
gcc-6.3 -O2: 896,807,820 (773,767,776)
gcc-6.3 -Os: 847,868,883 (752,756,755)
gcc-4.9 -O2: 897,805,797 (776,773,775)
gcc-4.9 -Os: 860,859,865 (768,774,773)

(clang mips32r5/p5600 results are merged as the generated code is identical)

Note gcc has some problem at -O2 and -mtune=74kf/p5600 on this CPU, however pure C is still faster than the asm macro.
Results should be valid for 74K*/1074K* (a bit simpler but similar pipeline).

580MHz 24KEc: (gcc: -mips32r2 -mtune=24kc/4kc/74kc):
gcc-9.2 -O2: 199,201,200 (168,167,167)
gcc-9.2 -Os: 207,192,194 (167,166,166)
gcc-7.3 -O2: 200,194,202 (171,171,171)
gcc-7.3 -Os: 207,194,194 (170,169,169)
gcc-6.3 -O2: 197,194,203 (171,170,170)
gcc-6.3 -Os: 204,192,193 (169,169,169)
gcc-4.9 -O2: 197,203,202 (171,171,170)
gcc-4.9 -Os: 190,190,190 (169,169,169)

In-order single-issue core with short load delay and relatively low-latency multiply, no surprises.
Results should be valid for 4K*/34K*/1004K*/M51xx etc as they all have identical or quite similar pipeline.

Finally, 1004MHz XBurst (gcc/clang: built-in NDK defaults):
clang-5 -O2: 264 (242)
clang-5 -Os: 264 (242)
clang-5 -Oz: 258 (238)
gcc-4.9 -O2: 329 (244)
gcc-4.9 -Os: 309 (241)

In-order MIPS32R1 core not from MTI/IMG/WaveComp, old compilers yet pure C is better than asm.

The C code advantage holds (with roughly the same ratio) for the rest of MPI-employing tests too.

It also makes the produced binaries consistently smaller (kinda ~400 bytes).

This patch limits the original asm code to MIPS I and II ISAs (instead of disabling it completely) as its history is unclear from github logs and there might be some very old MIPS CPUs which run it faster than C fallback.
Another option is some obscure non-gcc/non-clang compiler for MIPS MCUs which generates slow code from C; this should be handled explicitly IMHO without compromising the common case.

MIPS III, IV are explicitly omitted as all modern compilers still support these ISAs well and O32 ABI is not the natural choice there in any case.

gcc-9.2 and clang-8 are from alpine linux edge (musl-based), benchmark binaries are always dynamically linked;

gcc-7.3/6.3/4.9 are from codescape-2018.11-01/2018.09-03/2016.05-08, benchmark binaries are statically linked against glibc;

XBurst compilers are gcc-4.9 and clang-5 from Android NDK r16b (the last that supports MIPS).

include/mbedtls/bn_mul.h | 6 +++++- 
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/mbedtls/bn_mul.h b/include/mbedtls/bn_mul.h 
index 163869ae7..a15cd8058 100644 
--- a/include/mbedtls/bn_mul.h 
+++ b/include/mbedtls/bn_mul.h 
@@ -750,7 +750,11 @@ 
); 
#endif /* Alpha */ 

-#if defined(__mips__) && !defined(__mips64) 
+/* 
+ * This code is slower and bigger than C on P5600, 4K*, 24K* family and XBurst 
+ * with both clang and gcc (even older versions -- down to llvm-5 and gcc-4.9). 
+ */ 
+#if defined(__mips__) && __mips < 3 

#define MULADDC_INIT \ 
asm( \ 

2.23.0.windows.1
"

Issue request type

[ ] Question
[ ] Enhancement
[x ] Bug
@ciarmcom
Copy link
Member

Internal Jira reference: https://jira.arm.com/browse/IOTCRYPT-930

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants