Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code size #456

Closed
Dirbaio opened this issue Apr 21, 2021 · 14 comments
Closed

Code size #456

Dirbaio opened this issue Apr 21, 2021 · 14 comments
Assignees
Labels
breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version difficulty: medium Somewhat difficult to solve priority: medium Medium priority for the Knurling team status: needs PR Issue just needs a Pull Request implementing the changes type: enhancement Enhancement or feature request
Milestone

Comments

@Dirbaio
Copy link
Contributor

Dirbaio commented Apr 21, 2021

I have put together a code size testbench to measure code size. It has 2 test cases, spam logs lots of stuff (based on the qemu test) and smoltcp runs smoltcp's loopback example from latest git master (which has defmt support).

spam  |  smoltcp |  what
===================================
18428 |  60680   |  main branch
  624 |  49780   |  no defmt

spam has 156 log calls and is 18428 bytes (.text size). Commenting the run_test() call in main gives a .text size of 944 bytes. Therefore, defmt's code size is:

Base size (RTT stuff, etc): 944 - 624 = 320 bytes
Log call size: (18428-944)/155 = 112.8 bytes/call

112.8 bytes/call sounds about right. As seen here, a simple info!("defmt test {=u32}", x); is already 74 bytes of code, and the spam code does a mix of simple and complex log calls.

And 112.8 bytes avg, or 74 bytes to print a u32 is A LOT.

For comparison, let's check good old C printf:

void foo(int x) {
    printf("defmt test %d", x);
}

compiled with arm-none-eabi-gcc -c printf.c -Os -mthumb:

   2:	0001      	movs	r1, r0
   4:	4802      	ldr	r0, [pc, #8]	; (10 <foo+0x10>)
   6:	f7ff fffe 	bl	0 <printf>

This is just 8 bytes of code, or 8+14 = 22 bytes if we count the string.

Defmt is 3.5 times larger than printf when counting the string. And this is a generous comparison: arguably the apples-to-apples comparison is not counting the string, since the string pointer is equivalent to defmt's interned string IDs. In which case defmt is 10 times larger.

I'm aware defmt does much more than printf. It's to be expected a log call printing complex nested structs and enums generates bigger code. However, simple log calls (with no arguments or just a handful of integers) are the vast majority in real-world projects, and code size for these really hurts.

@Urhengulas
Copy link
Member

@Dirbaio: Thank you for putting in the effort to create this code size benchmark!

Do we know which parts of the code contribute the most bloat?

@jonas-schievink jonas-schievink added difficulty: medium Somewhat difficult to solve good first issue Good for newcomers priority: medium Medium priority for the Knurling team status: needs PR Issue just needs a Pull Request implementing the changes type: enhancement Enhancement or feature request labels Apr 26, 2021
@Dirbaio
Copy link
Contributor Author

Dirbaio commented Apr 26, 2021

I've been experimenting with this. Code in my codesize4 branch, testbench updated.

  • Replace Write trait with a write fn in Logger. #258 is an easy win.
  • Removing the bool and needs_tag optimizations is a big win since it allows the compiler to optimize the Formatter to essentially nothing, which avoids all the work of stack-allocating and initializing it at every log call.
  • Removing leb/zigzag encodings has almost no effect, perhaps unsurprisingly because the compiler wasn't inlining them anyway.
spam         |  smoltcp      |  what
code   tx'd  |  code   tx'd  |
===============================================
18428  2806  |  60680  1098  |  main branch
15460  2806  |  58232  1098  |  remove dyn Write (PR #258)
15764  2806  |  57904  1098  |  remove bool return in acquire
13076  2817  |  57800  1098  |  remove bool compression
13060  3181  |  57728  1458  |  remove leb64/leb128/zigzag. Make tags always 16bit
11828  3729  |  57872  1468  |  remove needs_tag optimization
11428  3729  |  57240  1468  |  make Formatter zero-sized

This is at the cost of increasing the wire size, but:

  • The needs_tag increase can probably be gained back with this without increasing code size.
  • The leb128/zigzag increase can be gained back with a "zero-removing" compression such as this. This will probably give even smaller wire size than before, since uXX/iXX isn't being leb-encoded and they usually are small too.

Something I want to try next is to "pack" together multile write calls. For example when printing 4 u32s, instead of doing 4 write calls, put them in a 16-byte buffer then do a single 16-byte write call. I have a WIP version of this but it doesn't improve codesize that much. I'm still investing why.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Apr 26, 2021

Updated table with size measurements from my company's firmware:

spam         |  smoltcp      | company's fw |  what
code   tx'd  |  code   tx'd  | code         |
==============================================================
18428  2806  |  60680  1098  | 136336       |  main branch
15460  2806  |  58232  1098  | 127152       |  remove dyn Write (PR #258)
15764  2806  |  57904  1098  | 125832       |  remove bool return in acquire
13076  2817  |  57800  1098  | 120096       |  remove bool compression
13060  3181  |  57728  1458  | 120024       |  remove leb64/leb128/zigzag. Make tags always 16bit
11828  3729  |  57872  1468  | 118280       |  remove needs_tag optimization
11428  3729  |  57240  1468  | 117064       |  make Formatter zero-sized
11428  3729  |  57312  1468  | 117168       |  Remove methods from Formatter
  624        |  49780        | 98716        |  no defmt

2x smaller! :)

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Apr 26, 2021

I've done a quick test with rzCOBS compression+framing. Results on spam test:

   With rzCOBS: 2290 bytes, 367308 cycles
Without rzCOBS: 3729 bytes, 337152 cycles

Wire size is better than upstream defmt, while keeping all the code size gains. Code size increase is only ~200 bytes.

This is without needs_tag. Wire size could be even better if this is implemented.

And it provides framing, so probe-run is able to recover from a corrupt stream!

@Dirbaio
Copy link
Contributor Author

Dirbaio commented May 2, 2021

@Urhengulas @jonas-schievink what are your thoughts on this? Is this something you would want to address?

#258 is an easy start, and I'm willing to clean/rework my other patches to PR them if you confirm this is something you'd like to review and merge.

@jonas-schievink
Copy link
Contributor

Personally I think this is a good change, inclusing plugging in rzCOBS, but we don't currently plan to make breaking changes to defmt, which complicates things.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented May 2, 2021

In what timeframe would you consider OK making the next breaking release?

@Urhengulas
Copy link
Member

In what timeframe would you consider OK making the next breaking release?

We currently want to avoid it, since it will split the ecosystem. We are discussing how to make breaking releases more feasible, mainly looking at #426.

Another option we discussed is maintaining a next branch for some time. This branch would include all the changes from main and breaking changes, which enables us to start the effort, but delay the breaking change for longer. What would you think about this workaround?

@Dirbaio
Copy link
Contributor Author

Dirbaio commented May 6, 2021

Using defmt from git was extremely painful in the pre-0.1 times.

  • It forces users to have to git-clone, patch and build probe-run.
  • For git builds defmt requires firmware and decoder versions to match exactly (Separate "crate version" from "wire format version" #287), so this means constantly rebuilding probe-run every time you upgrade defmt on your firmware.
  • You can't depend on git crates for crates published on crates.io. This means using this next branch in library crates (eg smoltcp, embassy) is unfeasible. Published crates would be stuck on 0.2, which in turns prevents end users from using next in their firmware if they depend on any such published crate. To get around this users would have to patch every single crate on their tree that depends on defmt 0.2!

@Urhengulas Urhengulas added breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version and removed good first issue Good for newcomers labels May 7, 2021
@japaric
Copy link
Member

japaric commented May 10, 2021

Wire size is better than upstream defmt, while keeping all the code size gains.

@Dirbaio do you have data on how much wire size is reduced for any of the 3 examples you posted in the last few tables?

@MathiasKoch
Copy link
Contributor

MathiasKoch commented May 10, 2021

Wire size is better than upstream defmt, while keeping all the code size gains.

@Dirbaio do you have data on how much wire size is reduced for any of the 3 examples you posted in the last few tables?

I think that would be the tx'd column?

And the

With rzCOBS: 2290 bytes
Without rzCOBS: 3729 bytes

@jonas-schievink
Copy link
Contributor

Hi @Dirbaio! Now that #492 has been fully implemented, how are we doing in terms of code size? I've seen #525, but that doesn't seem necessary to consider this issue fixed.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Aug 5, 2021

Yeah, #525 should reduce the code size further but in its current state it didn't. In my original experiments branch it did reduce it quite a lot, so I still have to check what's up with that.

Feel free to close this issue if you want. There'll always be some code size reduction possible, this'd stay open forever otherwise :P

@jonas-schievink
Copy link
Contributor

Okay, closing then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version difficulty: medium Somewhat difficult to solve priority: medium Medium priority for the Knurling team status: needs PR Issue just needs a Pull Request implementing the changes type: enhancement Enhancement or feature request
Projects
None yet
Development

No branches or pull requests

5 participants