Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate how to save flash space #1947

Closed
prusnak opened this issue Nov 26, 2021 · 13 comments
Closed

Investigate how to save flash space #1947

prusnak opened this issue Nov 26, 2021 · 13 comments
Assignees
Labels
code Code improvements core Trezor Core firmware. Runs on Trezor Model T and T2B1. flash reduction Decrease required size of flash low hanging fruit Simple, quick task.

Comments

@prusnak
Copy link
Member

prusnak commented Nov 26, 2021

We are slowly running out of Flash space. Let's collect some ideas how to save it:

  • compressing the embedded bootloader (see below)
  • using just single sha256 implementation

The greatest offenders (generated by bloaty -d sections,symbols build/firmware/firmware.elf)

   7.6%   749Ki  42.7%   749Ki    .flash
    52.0%   389Ki  52.0%   389Ki    [3352 Others]
    23.7%   177Ki  23.7%   177Ki    [section .flash]
     4.8%  36.2Ki   4.8%  36.2Ki    nist256p1
     4.8%  36.2Ki   4.8%  36.2Ki    secp256k1
     3.2%  24.0Ki   3.2%  24.0Ki    ge25519_niels_base_multiples
     1.9%  13.9Ki   1.9%  13.9Ki    blake2b_compress
     1.1%  8.00Ki   1.1%  8.00Ki    wordlist
     0.9%  7.03Ki   0.9%  7.03Ki    img_loader
     0.9%  6.94Ki   0.9%  6.94Ki    secp256k1_sha256_write
     0.8%  5.93Ki   0.8%  5.93Ki    mp_qstr_const_pool
     0.6%  4.57Ki   0.6%  4.57Ki    ripemd160_process
     0.5%  4.10Ki   0.5%  4.10Ki    mp_execute_bytecode
     0.5%  4.00Ki   0.5%  4.00Ki    slip39_wordlist
     0.5%  4.00Ki   0.5%  4.00Ki    t_fl
     0.5%  4.00Ki   0.5%  4.00Ki    t_fn
     0.5%  4.00Ki   0.5%  4.00Ki    t_il
     0.5%  4.00Ki   0.5%  4.00Ki    t_im
     0.5%  4.00Ki   0.5%  4.00Ki    t_in
     0.5%  4.00Ki   0.5%  4.00Ki    words_button_seq
     0.5%  3.75Ki   0.5%  3.75Ki    ge25519_niels_sliding_multiples
     0.5%  3.68Ki   0.5%  3.68Ki    blake2s_compress

   8.4%   824Ki  46.9%   824Ki    .flash2
    50.8%   418Ki  50.8%   418Ki    [13139 Others]
    26.5%   218Ki  26.5%   218Ki    [section .flash2]
     7.8%  64.0Ki   7.8%  64.0Ki    secp256k1_ecmult_static_context
     4.4%  36.1Ki   4.4%  36.1Ki    fun_data_apps_ethereum_tokens__lt_module_gt__token_by_chain_address
     3.2%  26.1Ki   3.2%  26.1Ki    mp_qstr_frozen_const_pool
     1.2%  9.90Ki   1.2%  9.90Ki    mp_frozen_mpy_names
     1.0%  8.62Ki   1.0%  8.62Ki    fun_data_apps_common_coininfo__lt_module_gt__by_name
     1.0%  7.88Ki   1.0%  7.88Ki    const_table_data_apps_ethereum_tokens__lt_module_gt__token_by_chain_address
     0.9%  7.29Ki   0.9%  7.29Ki    fun_data_all_modules__lt_module_gt_
     0.8%  6.24Ki   0.8%  6.24Ki    fun_data_apps_ethereum_networks__lt_module_gt___networks_iterator
     0.5%  4.00Ki   0.5%  4.00Ki    secp256k1_pre_g
     0.5%  4.00Ki   0.5%  4.00Ki    secp256k1_pre_g_128
     0.3%  2.32Ki   0.3%  2.32Ki    fun_data_apps_monero_xmr_bulletproof__lt_module_gt__BulletProofBuilder_verify_batch
     0.2%  1.42Ki   0.2%  1.42Ki    fun_data_trezor_enums_MessageType__lt_module_gt_
     0.2%  1.35Ki   0.2%  1.35Ki    fun_data_apps_cardano_sign_tx__lt_module_gt_
     0.2%  1.34Ki   0.2%  1.34Ki    mp_frozen_mpy_content
     0.2%  1.34Ki   0.2%  1.34Ki    fun_data_apps_webauthn_knownapps__lt_module_gt__by_rp_id_hash
     0.2%  1.30Ki   0.2%  1.30Ki    fun_data_trezor_ui_layouts_tt___init____lt_module_gt_
     0.1%  1.15Ki   0.1%  1.15Ki    fun_data_apps_monero_xmr_mlsag__lt_module_gt___generate_clsag
     0.1%  1.07Ki   0.1%  1.07Ki    fun_data_apps_monero_signing_step_09_sign_input__lt_module_gt__sign_input
     0.1%  1.07Ki   0.1%  1.07Ki    fun_data_apps_webauthn_fido2__lt_module_gt_
@prusnak prusnak added the code Code improvements label Nov 26, 2021
@prusnak
Copy link
Member Author

prusnak commented Nov 26, 2021

We can save some space by compressing the embedded bootloader (using the same method as we do for TOIF):

>>> from trezorlib.toif import _compress
>>> d = open("bootloader.bin","rb").read()
>>> c = _compress(d)
>>> len(d)
94208
>>> len(c)
61366

This will save us around 33KB of Flash space, but increases the complexity of critical part of the firmware - updating the bootloader.

@prusnak prusnak added the core Trezor Core firmware. Runs on Trezor Model T and T2B1. label Nov 26, 2021
@hiviah
Copy link
Contributor

hiviah commented Dec 6, 2021

I experimented with secp256k1_ecmult_static_context since it seems it would be a good candidate given its size and would fit CCMRAM exactly when uncompressed/generated on start. It turns out it doesn't compress well, but generating does not seem that slow, estimated ~85 ms. It's also hassle, but wanted to explore another option.

@grdddj
Copy link
Contributor

grdddj commented Mar 30, 2022

As agreed with @matejcik and @mmilata, I am adding my small research into the flash space problem.

Main goal is to determine how much space does the Rust code take, as we started to see build/firmware/firmware.elf section `.flash' will not fit in region 'FLASH' issues when starting to implement core/storage in Rust.

So far I was analyzing only emulator ... build/unix/trezor-emu-core, I will look into the actual firmware ... build/firmware/firmware.elf as well.

Without very much knowledge about it, I found the nm command, which shows statistics about our emulator file ... nm --radix=dec --size-sort build/unix/trezor-emu-core. We can then get for example all Rust functions by adding | grep trezor_lib

Example output showing the biggest Rust functions:

>>> nm --radix=dec --size-sort -r build/unix/trezor-emu-core | grep trezor_lib | head
0000000000001056 t _ZN10trezor_lib8protobuf6encode7Encoder12encode_field17h0c2e867b86c32001E
0000000000001056 t _ZN10trezor_lib8protobuf6decode7Decoder19message_from_stream17h4bb4c11a91701f11E
0000000000000972 t _ZN10trezor_lib8protobuf6encode7Encoder12encode_field17hd93211b0080da8c9E
0000000000000877 t _ZN10trezor_lib8protobuf6decode7Decoder12decode_field17h9001d46bd02924c3E
0000000000000789 t _ZN10trezor_lib13storagedevice15recovery_shares33storagerecoveryshares_fetch_group17h398d0865cc93c90dE
0000000000000771 t _ZN10trezor_lib13storagedevice14storage_device27storagedevice_get_device_id17h32ffd1bd4f49a0a5E
0000000000000762 t _ZN10trezor_lib13storagedevice14storage_device33storagedevice_set_mnemonic_secret17he01f740c7cf66eefE
0000000000000652 t _ZN10trezor_lib8protobuf3obj12msg_obj_attr17h33d14d544c65bc45E
0000000000000547 t _ZN10trezor_lib8protobuf6decode7Decoder20decode_defaults_into17h4a0faec11a9c7b1fE
0000000000000486 t _ZN10trezor_lib8protobuf6encode7Encoder14encode_message17h3fe64bdc4994692cE

To get user-friendly results out of it, I made a small script parsing the Rust function information, comparing the results etc.: https://gist.github.com/grdddj/b2cc8846e3f76608a8f6d948b6f1b0f9 (it needs some manual changes before running, to setup binary locations and the build command)

Example output from this script, looking at all Rust functions:

>>> python get_function_size_from_binary.py --no-compare
...
486: protobuf :: encode :: Encoder :: encode_message (1)
547: protobuf :: decode :: Decoder :: decode_defaults_into
652: protobuf :: obj :: msg_obj_attr
762: storagedevice :: storage_device :: storagedevice_set_mnemonic_secret
771: storagedevice :: storage_device :: storagedevice_get_device_id
789: storagedevice :: recovery_shares :: storagerecoveryshares_fetch_group
877: protobuf :: decode :: Decoder :: decode_field
972: protobuf :: encode :: Encoder :: encode_field
1056: protobuf :: decode :: Decoder :: message_from_stream
1056: protobuf :: encode :: Encoder :: encode_field (1)
27448: total
234: total_amount

This shows that there is around 27 kb of Rust-related code (14 kb of that is the new storagedevice module). These results however do not include UI redesign functions, I will try to include them later.

Running the script for everything (not just Rust) gives:

...
8192: slip39_wordlist
10178: secp256k1_sha256_write
13456: mp_qstr_const_pool
14668: mp_execute_bytecode
16392: wordlist
24576: ge25519_niels_base_multiples
37084: nist256p1
37084: secp256k1
65536: secp256k1_ecmult_static_context
849185: total
4463: total_amount

which means there is almost 850 kb of code, the biggest occupiers being the cryptography things.

Conclusion:
Rust currently is a very small part - 27 kb vs 850 kb

@grdddj
Copy link
Contributor

grdddj commented Mar 31, 2022

Running nm on the current master emulator (built with UI2=1) found out the Rust UI itself has also around 27 kb:

(could be replicated by passing --mode rust_ui to the above script)

...
536: _ZN10trezor_lib2ui9component4text10paragraphs19Paragraphs$LT$T$GT$13change_offset17had74e3db161430f3E
541: _ZN109_$LT$trezor_lib..ui..component..base..Child$LT$T$GT$$u20$as$u20$trezor_lib..ui..layout..obj..ObjComponent$GT$9obj_event17hbee63d60a5accfe6E
567: _ZN112_$LT$trezor_lib..ui..model_tt..component..swipe..Swipe$u20$as$u20$trezor_lib..ui..component..base..Component$GT$5event17hdcc1f376b17c79f6E
615: _ZN109_$LT$trezor_lib..ui..component..base..Child$LT$T$GT$$u20$as$u20$trezor_lib..ui..layout..obj..ObjComponent$GT$9obj_place17h1cfdb8671da4a2f6E
789: _ZN109_$LT$trezor_lib..ui..component..base..Child$LT$T$GT$$u20$as$u20$trezor_lib..ui..layout..obj..ObjComponent$GT$9obj_event17hee566170cee1a9a7E
904: _ZN109_$LT$trezor_lib..ui..component..base..Child$LT$T$GT$$u20$as$u20$trezor_lib..ui..layout..obj..ObjComponent$GT$9obj_paint17h4de8d47d3d7eca79E
941: _ZN10trezor_lib2ui9component4text6layout10TextLayout11layout_text17he0b38fe77ad4ed96E
950: _ZN109_$LT$trezor_lib..ui..component..base..Child$LT$T$GT$$u20$as$u20$trezor_lib..ui..layout..obj..ObjComponent$GT$9obj_event17hb733cb801aba63d4E
1154: _ZN109_$LT$trezor_lib..ui..component..base..Child$LT$T$GT$$u20$as$u20$trezor_lib..ui..layout..obj..ObjComponent$GT$9obj_event17hfe386ae64e4c874fE
1234: _ZN10trezor_lib2ui9component4text9formatted26FormattedText$LT$F$C$T$GT$14layout_content17he18e49b6355d36caE
1526: _ZN109_$LT$trezor_lib..ui..component..base..Child$LT$T$GT$$u20$as$u20$trezor_lib..ui..layout..obj..ObjComponent$GT$9obj_event17h580e53b1b2d37313E
27500: total
167: total_amount

@grdddj
Copy link
Contributor

grdddj commented Mar 31, 2022

UI2=1 make build_firmware failed for me on master with region `FLASH' overflowed by 28168 bytes - which would approximately correspond to the size of rust-UI in the emulator (above).

For the same space-issue I cannot make build_firmware for the new storage/device functionality.

From these unsuccessful builds only firmware.map is generated, but it is not parseable for function sizes.

@grdddj
Copy link
Contributor

grdddj commented Apr 1, 2022

I continued with analyzing build/firmware/frozen_mpy.o, which has information about the size of micropython functions, and it is possible to get the size of each application in scr/apps (and also each function in those applications).

Helper script for analysis is at https://gist.github.com/grdddj/5aeb38f17713f038639dde1c667dbd21

Notable results and examples:

>>> python get_apps_size_from_binary.py --statistics
...
11_876: apps/stellar
11_987: trezor/crypto
14_849: apps/nem
20_187: apps/management
27_881: apps/webauthn
29_693: apps/common
35_966: apps/cardano
59_706: apps/bitcoin
59_985: trezor/ui
63_002: apps/monero
101_749: apps/ethereum
516_077: total
>>> python get_apps_size_from_binary.py --lines
...
1092: core/src/apps/webauthn/fido2.py
1098: core/src/apps/monero/signing/step_09_sign_input.py:28  ... sign_input (1)
1175: core/src/apps/monero/xmr/mlsag.py:342 ... _generate_clsag
1371: core/src/apps/webauthn/knownapps.py:20  ... by_rp_id_hash
1379: core/src/trezor/ui/layouts/tt/__init__.py
1392: mp_frozen_mpy_content
1493: core/src/trezor/enums/MessageType.py
1691: core/src/apps/cardano/sign_tx.py
2378: core/src/apps/monero/xmr/bulletproof.py:1509 ... BulletProofBuilder.verify_batch (1) (1)
7176: core/src/apps/ethereum/networks.py:44  ... _networks_iterator
7561: core/src/all_modules.py
8092: core/src/apps/ethereum/tokens.py:14  ... token_by_chain_address - CONST_TABLE_DATA (SPECIAL)
8361: core/src/apps/common/coininfo.py:92  ... by_name
10241: mp_frozen_mpy_names
27184: mp_qstr_frozen_const_pool
37077: core/src/apps/ethereum/tokens.py:14  ... token_by_chain_address
555386: total
13369: total_amount
>>> python get_apps_size_from_binary.py --app ethereum --lines
...
304: core/src/apps/ethereum/sign_tx_eip1559.py
305: core/src/apps/ethereum/sign_tx.py
368: core/src/apps/ethereum/sign_typed_data.py:443 ... validate_field_type
406: core/src/apps/ethereum/layout.py
427: core/src/apps/ethereum/sign_typed_data.py
428: core/src/apps/ethereum/networks.py:44  ... _networks_iterator - CONST_TABLE_DATA (SPECIAL)
451: core/src/apps/ethereum/sign_tx_eip1559.py:55  ... sign_tx_eip1559
455: core/src/apps/ethereum/sign_tx.py:35  ... sign_tx
650: core/src/apps/ethereum/sign_typed_data.py:227 ... TypedDataEnvelope.get_and_encode_data
7176: core/src/apps/ethereum/networks.py:44  ... _networks_iterator
8092: core/src/apps/ethereum/tokens.py:14  ... token_by_chain_address - CONST_TABLE_DATA (SPECIAL)
37077: core/src/apps/ethereum/tokens.py:14  ... token_by_chain_address
101749: total
2493: total_amount
>>> python get_apps_size_from_binary.py --file-name ethereum/sign_typed_data.py --lines
...
81:  core/src/apps/ethereum/sign_typed_data.py:392 ... write_leftpad32
119: core/src/apps/ethereum/sign_typed_data.py:204 ... TypedDataEnvelope.find_typed_dependencies
121: core/src/apps/ethereum/sign_typed_data.py:183 ... TypedDataEnvelope.encode_type (1)
125: core/src/apps/ethereum/sign_typed_data.py:41  ... sign_typed_data
126: core/src/apps/ethereum/sign_typed_data.py:516 ... get_name_and_version_for_domain
148: core/src/apps/ethereum/sign_typed_data.py:141 ... TypedDataEnvelope._collect_types
173: core/src/apps/ethereum/sign_typed_data.py:350 ... encode_field
230: core/src/apps/ethereum/sign_typed_data.py:414 ... validate_value
234: core/src/apps/ethereum/sign_typed_data.py:61  ... generate_typed_data_hash
368: core/src/apps/ethereum/sign_typed_data.py:443 ... validate_field_type
427: core/src/apps/ethereum/sign_typed_data.py
650: core/src/apps/ethereum/sign_typed_data.py:227 ... TypedDataEnvelope.get_and_encode_data
4428: total
94:  total_amount
>>> python get_apps_size_from_binary.py --grep trezor_ui --lines
...
415: core/src/trezor/ui/components/tt/scroll.py:352 ... paginate_paragraphs
428: core/src/trezor/ui/layouts/tt/reset.py:26  ... show_share_words
431: core/src/trezor/ui/components/tt/passphrase.py
447: core/src/trezor/ui/layouts/tt/__init__.py:794 ... confirm_properties (1)
458: core/src/trezor/ui/layouts/tt/__init__.py:617 ... confirm_blob (1) (1)
533: core/src/trezor/ui/components/tt/scroll.py
537: core/src/trezor/ui/__init__.py
657: core/src/trezor/ui/style.py
662: core/src/trezor/ui/components/common/text.py:180 ... render_text
809: core/src/trezor/ui/components/tt/reset.py:42  ... Slip39NumInput.on_render
1379: core/src/trezor/ui/layouts/tt/__init__.py
59985: total
1556: total_amount

There could be a lot of conclusions from this, I would note the size of trezor/ui, almost 60 kb, which could compensate for the Rust UI

@mmilata
Copy link
Member

mmilata commented Apr 12, 2022

We can save 12KiB by using different implementation of blake2b. The reference implementation we use does some manual inlining that can be turned into loop, slowing the function down somewhat. This one doesn't use such inlining.

@invd
Copy link
Contributor

invd commented Apr 12, 2022

@mmilata

Two observations:
I think the referenced potential alternative blake2b implementation uses malloc(), see line 107. The Trezor One and trezor-crypto code is usually designed to work only on the stack, although the practical requirements for the Trezor T may be different.

The Clang compiler has significant problems to optimize the blake2b/blake2s code if compiler sanitizers are present, up to the point of crashing during compilation, see here. GCC, which is used for the firmware builds does not have these apparent issues, and the issues in clang only show up in the presence of sanitizers, but it's possible that the compilers produces sub-standard results in other ways even during "normal" builds. It may be helpful to experiment with blake2b.o: OPTFLAGS += -O0 and other file-specific Makefile settings first to see if significant reductions in size can be achieved for the existing code by steering the compiler optimizations differently. The main focus of the observed issues is blake2b_compress(), although this doesn't have to be relevant for the size topic.

@matejcik
Copy link
Contributor

it appears that a lot of debug dump functionality from Rust is getting into the builds. we might want to keep an eye on that.

things that seemed to help locally:

  • instead of #[derive(Debug)], use #[cfg_attr(debug_assertions, derive(Debug))] (source
    • this will break (some) bare unwrap()s because Result<T, E>::unwrap() requires E: Debug
  • replace bare unwraps with something like unwrap_or_panic(error: &'static str)

more generally, we need to watch for the presence of these things in the binaries. nm can see them.


Another thing to work on is some sort of dependency tracker for included symbols. This is more important for the bootloader usecase, where just plain including Rust costs us over 10 kB of space. Using nm together with objdump disassembly would allow us to construct a dependency graph with weights, identify functions that are only used by a subtree, etc.


One more thing to watch out for is generic instance explosion -- code that is succinct in Rust might end up generating a lot of needless repetitions in the binary.

A thing we should investigate: whenever const generics are used (such as Field<32>), we might end up with multiple instances for Field<1>.read(), Field<20>.read() etc. I don't know if the compiler is smart about this, we might end up extracting the const generic to a regular struct member.

@mmilata
Copy link
Member

mmilata commented May 19, 2022

A thing we should investigate: whenever const generics are used (such as Field<32>), we might end up with multiple instances for Field<1>.read(), Field<20>.read() etc. I don't know if the compiler is smart about this, we might end up extracting the const generic to a regular struct member.

Investigated a bit and it appears that all generic parameters get monomorphized, including consts. Assembly output seems to confirm this. So I'd consider de-constifying the parameter.

@grdddj
Copy link
Contributor

grdddj commented Jun 21, 2022

For issue-tracking purposes, mentioning that there is an open PR with the tool generating the size statistics of .elf file.

We agreed to keep this issue open with low priority for next sprints.

@andrewkozlik
Copy link
Contributor

andrewkozlik commented Jul 12, 2022

Summary of ideas to increase available flash memory:

  • Provide signed Ethereum token definitions on-the-fly instead of hard-coding them in the firmware.
  • There are four 16 kB sectors (12 - 15) that are unused.
    • These can give us 64 kB more flash memory.
    • Sector 3 is also unused, but we might want to reserve that one for the bootloader maybe.
  • Change ECMULT_GEN_PREC_BITS from 4 to 2.
    • This should save 32 kB of flash memory at the price of doubling the signing time from 11.7 ms to 23.4 ms, see Re-enable secp256k1-zkp and make it default #1534 (comment), which should not be a problem, because it used to be 35 ms in the old trezor-crypto implementation.
    • An alternative is to generate this table dynamically during firmware startup and store it in CCM RAM. This would save 64 kB or memory but we would have to measure how long it takes to generate the table and judge whether it's acceptable.
    • RAM has been a more scarce resource for us than flash memory in the past, so flash storage seems to make more sense.
    • Test: Confirmed that this saves 32 kB.
  • Set USE_PRECOMPUTED_CP to 0.
    • This should save 2*37 kB of flash memory at the price of increasing signing time and pubkey generation for U2F and probably NEM and Ethereum. Not sure how much the signing time would increase.
    • Ideally we should also handle Deeper integration of secp256k1_zkp #1864 to avoid the signing time increase in NEM and Ethereum. Pull request: Unify secp256k1_zkp usage #2385.
    • Test: I was unable to confirm any space savings in the final firmware. Unable to explain. The tables are absent from the firmware, I guess there is some padding instead?
  • Reduce inlining in our Blake implementation.
    • This could maybe save 12 kB based on discussion above.
  • Reduce inlining in our SHA implementation.
  • Replace our implementation of sha256 with the one from secp256k1-zkp.

@prusnak prusnak added the flash reduction Decrease required size of flash label Aug 30, 2022
@hynek-jina hynek-jina assigned matejcik and unassigned grdddj Oct 12, 2022
@matejcik matejcik removed their assignment Dec 13, 2022
@matejcik
Copy link
Contributor

We are now doing reasonably well on flash space. This issue is not actionable and once we dump all relevant info into Notion we will close it.

@matejcik matejcik self-assigned this Dec 13, 2022
@matejcik matejcik added the low hanging fruit Simple, quick task. label Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Code improvements core Trezor Core firmware. Runs on Trezor Model T and T2B1. flash reduction Decrease required size of flash low hanging fruit Simple, quick task.
Projects
Archived in project
Development

No branches or pull requests

9 participants