Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/fmt: Small optimizations #7366

Merged
merged 1 commit into from
Sep 6, 2017

Conversation

jnohlgard
Copy link
Member

This PR saves a few bytes on Cortex-M0 and reduces the number of calls to the long division helper functions (__aeabi_uidiv) in fmt_dec_u64, by using multiplication instead of division.

Verified by manual inspection of compiler generated assembly code.

@jnohlgard jnohlgard added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Impact: minor The PR is small in size and might only require a quick look of a knowledgeable reviewer labels Jul 15, 2017
@jnohlgard jnohlgard added this to the Release 2017.10 milestone Jul 15, 2017
@jnohlgard jnohlgard requested a review from kaspar030 July 15, 2017 10:24
@miri64
Copy link
Member

miri64 commented Jul 15, 2017

I actually can't confirm that:

text	data	bss	dec	BOARD/BINDIRBASE

12	0	0	12	airfy-beacon
13996	136	2640	16772	master
14008	136	2640	16784	opt-fmt

20	0	0	20	arduino-due
14196	200	2712	17108	master
14216	200	2712	17128	opt-fmt

12	0	0	12	arduino-mkr1000
14312	136	2764	17212	master
14324	136	2764	17224	opt-fmt

12	0	0	12	arduino-zero
14380	136	2768	17284	master
14392	136	2768	17296	opt-fmt

12	0	0	12	b-l072z-lrwan1
14408	136	2772	17316	master
14420	136	2772	17328	opt-fmt

12	0	0	12	calliope-mini
13904	136	2636	16676	master
13916	136	2636	16688	opt-fmt

20	0	0	20	cc2538dk
14412	136	2884	17432	master
14432	136	2884	17452	opt-fmt

20	0	0	20	cc2650stk
13560	136	2884	16580	master
13580	136	2884	16600	opt-fmt

16	0	0	16	ek-lm4f120xl
14708	136	3028	17872	master
14724	136	3028	17888	opt-fmt

16	0	0	16	f4vi1
14300	136	2756	17192	master
14316	136	2756	17208	opt-fmt

20	0	0	20	fox
14260	136	2768	17164	master
14280	136	2768	17184	opt-fmt

16	0	0	16	frdm-k64f
15736	140	3176	19052	master
15752	140	3176	19068	opt-fmt

20	0	0	20	iotlab-a8-m3
14276	136	2768	17180	master
14296	136	2768	17200	opt-fmt

20	0	0	20	iotlab-m3
14276	136	2768	17180	master
14296	136	2768	17200	opt-fmt

20	0	0	20	limifrog-v1
14368	136	2772	17276	master
14388	136	2772	17296	opt-fmt

20	0	0	20	maple-mini
14300	136	2780	17216	master
14320	136	2780	17236	opt-fmt

20	0	0	20	mbed_lpc1768
13816	136	2636	16588	master
13836	136	2636	16608	opt-fmt

12	0	0	12	microbit
13904	136	2636	16676	master
13916	136	2636	16688	opt-fmt

-8	0	0	-8	msba2
20600	132	98172	118904	master
20592	132	98172	118896	opt-fmt

16	0	0	16	msbiot
14660	136	2776	17572	master
14676	136	2776	17588	opt-fmt

28	0	0	28	mulle
21744	272	3684	25700	master
21772	272	3684	25728	opt-fmt

-32	0	0	-32	native
34127	384	47708	82219	master
34095	384	47708	82187	opt-fmt

12	0	0	12	nrf51dongle
13936	136	2636	16708	master
13948	136	2636	16720	opt-fmt

16	0	0	16	nrf52840dk
14068	136	2640	16844	master
14084	136	2640	16860	opt-fmt

16	0	0	16	nrf52dk
14036	136	2640	16812	master
14052	136	2640	16828	opt-fmt

12	0	0	12	nrf6310
14028	136	2644	16808	master
14040	136	2644	16820	opt-fmt

12	0	0	12	nucleo-f030
14076	136	2764	16976	master
14088	136	2764	16988	opt-fmt

12	0	0	12	nucleo-f070
14108	136	2772	17016	master
14120	136	2772	17028	opt-fmt

12	0	0	12	nucleo-f072
14336	136	2776	17248	master
14348	136	2776	17260	opt-fmt

12	0	0	12	nucleo-f091
14336	136	2776	17248	master
14348	136	2776	17260	opt-fmt

20	0	0	20	nucleo-f103
14276	136	2780	17192	master
14296	136	2780	17212	opt-fmt

16	0	0	16	nucleo-f302
14612	136	2776	17524	master
14628	136	2776	17540	opt-fmt

16	0	0	16	nucleo-f303
14660	136	2780	17576	master
14676	136	2780	17592	opt-fmt

16	0	0	16	nucleo-f334
14508	136	2760	17404	master
14524	136	2760	17420	opt-fmt

16	0	0	16	nucleo-f401
14656	136	2776	17568	master
14672	136	2776	17584	opt-fmt

16	0	0	16	nucleo-f410
14528	136	2768	17432	master
14544	136	2768	17448	opt-fmt

16	0	0	16	nucleo-f411
14612	136	2776	17524	master
14628	136	2776	17540	opt-fmt

16	0	0	16	nucleo-f446
14692	136	2784	17612	master
14708	136	2784	17628	opt-fmt

12	0	0	12	nucleo-l053
14372	136	2768	17276	master
14384	136	2768	17288	opt-fmt

12	0	0	12	nucleo-l073
14404	136	2776	17316	master
14416	136	2776	17328	opt-fmt

20	0	0	20	nucleo-l1
14324	136	2776	17236	master
14344	136	2776	17256	opt-fmt

16	0	0	16	nucleo-l476
14644	136	2776	17556	master
14660	136	2776	17572	opt-fmt

20	0	0	20	nucleo144-f207
14540	136	2780	17456	master
14560	136	2780	17476	opt-fmt

16	0	0	16	nucleo144-f303
14660	136	2776	17572	master
14676	136	2776	17588	opt-fmt

16	0	0	16	nucleo144-f412
14660	136	2776	17572	master
14676	136	2776	17588	opt-fmt

16	0	0	16	nucleo144-f413
14740	136	2776	17652	master
14756	136	2776	17668	opt-fmt

16	0	0	16	nucleo144-f429
14660	136	2776	17572	master
14676	136	2776	17588	opt-fmt

16	0	0	16	nucleo144-f446
14660	136	2776	17572	master
14676	136	2776	17588	opt-fmt

32	0	0	32	nucleo144-f746
14500	136	2772	17408	master
14532	136	2772	17440	opt-fmt

32	0	0	32	nucleo144-f767
14548	136	2772	17456	master
14580	136	2772	17488	opt-fmt

12	0	0	12	nucleo32-f031
14228	136	2760	17124	master
14240	136	2760	17136	opt-fmt

12	0	0	12	nucleo32-f042
14284	136	2768	17188	master
14296	136	2768	17200	opt-fmt

16	0	0	16	nucleo32-f303
14544	136	2768	17448	master
14560	136	2768	17464	opt-fmt

12	0	0	12	nucleo32-l031
14312	136	2760	17208	master
14324	136	2760	17220	opt-fmt

16	0	0	16	nucleo32-l432
14608	136	2768	17512	master
14624	136	2768	17528	opt-fmt

20	0	0	20	nz32-sc151
14428	136	2784	17348	master
14448	136	2784	17368	opt-fmt

20	0	0	20	opencm904
14100	136	2772	17008	master
14120	136	2772	17028	opt-fmt

20	0	0	20	openmote-cc2538
14412	136	2884	17432	master
14432	136	2884	17452	opt-fmt

16	0	0	16	pba-d-01-kw2x
15860	140	3188	19188	master
15876	140	3188	19204	opt-fmt

12	0	0	12	pca10000
13932	136	2636	16704	master
13944	136	2636	16716	opt-fmt

12	0	0	12	pca10005
14008	136	2644	16788	master
14020	136	2644	16800	opt-fmt

20	0	0	20	remote-pa
15088	136	2884	18108	master
15108	136	2884	18128	opt-fmt

20	0	0	20	remote-reva
15080	136	2884	18100	master
15100	136	2884	18120	opt-fmt

20	0	0	20	remote-revb
15088	136	2884	18108	master
15108	136	2884	18128	opt-fmt

12	0	0	12	samd21-xpro
14476	136	2784	17396	master
14488	136	2784	17408	opt-fmt

12	0	0	12	saml21-xpro
14212	136	2760	17108	master
14224	136	2760	17120	opt-fmt

12	0	0	12	samr21-xpro
14408	136	2772	17316	master
14420	136	2772	17328	opt-fmt

20	0	0	20	seeeduino_arch-pro
13816	136	2636	16588	master
13836	136	2636	16608	opt-fmt

16	0	0	16	slwstk6220a
13800	136	2756	16692	master
13816	136	2756	16708	opt-fmt

12	0	0	12	sodaq-autonomo
14432	136	2784	17352	master
14444	136	2784	17364	opt-fmt

20	0	0	20	spark-core
14280	136	2760	17176	master
14300	136	2760	17196	opt-fmt

12	0	0	12	stm32f0discovery
14368	136	2772	17276	master
14380	136	2772	17288	opt-fmt

16	0	0	16	stm32f3discovery
14756	136	2780	17672	master
14772	136	2780	17688	opt-fmt

16	0	0	16	stm32f4discovery
14688	136	2772	17596	master
14704	136	2772	17612	opt-fmt

20	0	0	20	udoo
14196	200	2712	17108	master
14216	200	2712	17128	opt-fmt

12	0	0	12	weio
14132	136	2700	16968	master
14144	136	2700	16980	opt-fmt

12	0	0	12	yunjia-nrf51822
14004	136	2644	16784	master
14016	136	2644	16796	opt-fmt

I used the following compilers:

[mlenders@mk RIOT]<3 arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[mlenders@mk RIOT]<3 avr-gcc --version
avr-gcc (GCC) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[mlenders@mk RIOT]<3 msp430-gcc --version
msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[mlenders@mk RIOT]<3 gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(did not test on MIPS)

@jnohlgard
Copy link
Member Author

@miri64 Which test application are those numbers taken from?
I will do some analysis later. Did you at least get fewer calls to the long division helper?

@miri64
Copy link
Member

miri64 commented Jul 15, 2017

@gebart: tests/unittests, just compiled with make tests-fmt. I removed the offending lines to get around #7033 (which is okay, since fmt doesn't use C++ ;-)). Regarding long division helper: how do I find this out?

@miri64
Copy link
Member

miri64 commented Jul 15, 2017

(With "the offending lines" I meant just the "ifneq (,$(filter tests-cpp_%, $(UNIT_TESTS)))" block ;-))

sys/fmt/fmt.c Outdated
@@ -183,12 +183,12 @@ size_t fmt_u16_dec(char *out, uint16_t val)

size_t fmt_s32_dec(char *out, int32_t val)
{
int negative = (val < 0);
size_t negative = (val < 0 ? 1 : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not unsigned, as used in other places? Even (unsigned) char would do, but will likely not be more optimal in size, anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why ternary here - to be more explicit? If so I'd rather write it as `(val < 0) ? 1 :0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, to be more readable, the parentheses should be around val<0

Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

sys/fmt/fmt.c Outdated
@@ -183,12 +183,12 @@ size_t fmt_u16_dec(char *out, uint16_t val)

size_t fmt_s32_dec(char *out, int32_t val)
{
int negative = (val < 0);
size_t negative = (val < 0 ? 1 : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, to be more readable, the parentheses should be around val<0

sys/fmt/fmt.c Outdated
@@ -262,11 +262,11 @@ size_t fmt_float(char *out, float f, unsigned precision)
{
assert (precision <= 7);

unsigned negative = (f < 0);
unsigned negative = ((f < 0) ? 1 : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the outer parens can go. Anyhow, remove them here or add above, please!

@jnohlgard jnohlgard force-pushed the pr/fmt-optimizations branch from 6386700 to f6d6250 Compare August 15, 2017 19:39
Copy link
Member

@smlng smlng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this looks good and ready for merge in general.
Side note: the pwr function is (currently) unused and could be removed?

What about the observed increase (12-20B) in binary sizes by @miri64?

@smlng smlng added the CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable label Aug 21, 2017
@kaspar030
Copy link
Contributor

just compiled with make tests-fmt

did you compile with "RIOT_VERSION_OVERRIDE=test" in both master and this PR's branch?

@kaspar030
Copy link
Contributor

I see 16byte size reduction when compiling "unittests/tests-fmt" for samr21-xpro and mulle, using "gcc version 7.1.0 (Arch Repository)".

sys/fmt/fmt.c Outdated
@@ -194,12 +194,12 @@ size_t fmt_u16_dec(char *out, uint16_t val)

size_t fmt_s32_dec(char *out, int32_t val)
{
int negative = (val < 0);
unsigned negative = (val < 0) ? 1 : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the side-by-side diff, does appending ? 1 : 0 really increase readability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second look, maybe not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO (val < 0) is perfectly readable and to the point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gebart what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gebart what do you think?

Just saw that you already replied. IMO if it is unclear which is better to read, let's lower the amount of characters that our poor brains need to parse.

@smlng
Copy link
Member

smlng commented Aug 21, 2017

I see a size reduction, too for arduino-uno of 26B for text and 8B for data. But on native macOS I get an error for the now unused pwr function,

@kaspar030
Copy link
Contributor

I see a size reduction

What compiler version?

@smlng
Copy link
Member

smlng commented Aug 21, 2017

avr-gcc (GCC) 7.1.0

@kaspar030 kaspar030 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Aug 21, 2017
@kaspar030
Copy link
Contributor

avr-gcc (GCC) 7.1.0

Let's see what the CI compilers say.

@smlng
Copy link
Member

smlng commented Aug 21, 2017

also: need to remove (or comment out) unused pwr, or not?

@kaspar030
Copy link
Contributor

need to remove (or comment out) unused pwr, or not?

yes!

@kaspar030
Copy link
Contributor

Hm, in many cases, this PR increases codesize:

https://gist.github.com/kaspar030/50d9852f9aaec1c5de72fcac23b102d9

This compares sizes.json from the PR's CI build with a manually built sizes.json from the PR's base branch at build time.

I assume that any application using the division helper elsewhere gains weight.
Also, it seems like again, due to it's harvard architecture, avr8 cannot put tenmap in ROM, which shows in this PR's size comparison as now more functions use tenmap...

@smlng
Copy link
Member

smlng commented Aug 22, 2017

I made some tests, and removed _tenmap by replacing it like:

-    uint32_t e = _tenmap[fp_digits];
+    uint32_t e = 1;
+    for (unsigned i = 0; i < fp_digits; ++i) {
+        e *= 10;
+    }

this saves 10++ bytes here and there, but will likely consume some CPU cycle. So which is to optimize: binary size or cpu cycles? More cpu cycles may mean more energy consumption, right?

@smlng
Copy link
Member

smlng commented Aug 28, 2017

how do we proceed with this one? IMHO if binary size is critical don't use FMT, on the other hand with #7499 added on top of this it reduces overall size of the FMT module again.

@smlng
Copy link
Member

smlng commented Sep 4, 2017

@kaspar030 how do we move this (and related PRs #7499, and #7263) forward?

@kaspar030
Copy link
Contributor

IMHO if binary size is critical don't use FMT

Well, fmt is supposed to be a lightest-weight alternative to *printf...

on the other hand with #7499 added on top of this it reduces overall size of the FMT module again.

Let's merge this. It definately looks cleaner.

Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ternary operator question still open.

sys/fmt/fmt.c Outdated
@@ -194,12 +194,12 @@ size_t fmt_u16_dec(char *out, uint16_t val)

size_t fmt_s32_dec(char *out, int32_t val)
{
int negative = (val < 0);
unsigned negative = (val < 0) ? 1 : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gebart what do you think?

@smlng
Copy link
Member

smlng commented Sep 4, 2017

IMHO if binary size is critical don't use FMT

Well, fmt is supposed to be a lightest-weight alternative to *printf...

sure, what I meant was that if you really care about size (or have a production release) you might discard output completely, hence no fmt_ or printf.

@jnohlgard
Copy link
Member Author

I agree with @smlng, the form (x < 0) is just as readable. I will remove the ternary ifs

@kaspar030
Copy link
Contributor

I will remove the ternary ifs

Please squash directly!

@smlng
Copy link
Member

smlng commented Sep 6, 2017

@gebart can you make the remaining small changes/fixes, so we can move #7499 forward, too?!

@jnohlgard jnohlgard force-pushed the pr/fmt-optimizations branch from 1ef08e2 to 7a20521 Compare September 6, 2017 19:28
@jnohlgard jnohlgard force-pushed the pr/fmt-optimizations branch from 7a20521 to 1e43bd4 Compare September 6, 2017 19:29
@jnohlgard
Copy link
Member Author

@smlng fixed, squashed

@kaspar030 kaspar030 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Sep 6, 2017
@kaspar030 kaspar030 merged commit d61760c into RIOT-OS:master Sep 6, 2017
@jnohlgard jnohlgard deleted the pr/fmt-optimizations branch September 20, 2017 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Impact: minor The PR is small in size and might only require a quick look of a knowledgeable reviewer Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants