rpi4's bare-metal hashing performance is poor without caching #155
Replies: 5 comments 17 replies
-
| I presume the file is already entirely copied to RAM when your loader does computation on it? Do you have virtual memory and caching enabled? | 
Beta Was this translation helpful? Give feedback.
-
| Just in case this wasn’t already discussed: Caching is predicated on the
MMU being enabled on Arm v{7,8}~A. Cacheability expression needs to be done
via page table descriptors.
This is true irrespective of address translation being required or not.
A typical setup for such early boot code is to setup identity maps via a
suitable set of page table entries.… On Fri, 15 Apr 2022 at 18:35, nihalpasham ***@***.***> wrote:
 Yes, the file (to be hashed) is loaded into RAM.
 The MMU is disabled, so no virtual memory. I assume by caching, you mean
 d-cache. If yes, that's not enabled either. (one of the goals is to ensure
 that the bootloader has the smallest possible trusted computing base)
 But that's an interesting point. I assumed the only variable to consider
 was the single-core frequency. Would enabling them improve performance?
 If yes, I'd be curious to know why?
 —
 Reply to this email directly, view it on GitHub
 <#155 (reply in thread)>,
 or unsubscribe
 <https://github.com/notifications/unsubscribe-auth/AAFMKYRSRRIBYNOYS6XGEKLVFFSQBANCNFSM5TP7SIFQ>
 .
 You are receiving this because you are subscribed to this thread.Message
 ID:
 <rust-embedded/rust-raspberrypi-OS-tutorials/repo-discussions/155/comments/2573843
 @github.com>
 | 
Beta Was this translation helpful? Give feedback.
-
| Ok, I compiled  Note: I moved the MMU activation code, so that we're able to log the activation flow. . [    0.007482] MAIR_EL1: 0xff04
[    0.075640] Special regions:
[    0.075698]       0x00080000 - 0x0008ffff |  64 KiB | C   RO PX  | Kernel code and RO data
[    0.076652]       0x1fff0000 - 0x1fffffff |  64 KiB | Dev RW PXN | Remapped Device MMIO
[    0.077638]       0xfe000000 - 0xff84ffff |  24 MiB | Dev RW PXN | Device MMIO
[    0.078527] BASE ADDR: 0x120000
[    0.078905] TTBR0_EL1: 0x120000
[    0.079285] TCR_EL1: 0x200807520
[    0.079675] SCTLR_EL1: 0xc50838
[    0.080054] After enabling MMU, SCTLR_EL1: 0xc5183d
[    0.080648] mingo version 0.10.0
[    0.081038] Booting on: Raspberry Pi 4
[    0.081493] MMU online. Special regions:
[    0.081970]       0x00080000 - 0x0008ffff |  64 KiB | C   RO PX  | Kernel code and RO data
[    0.082988]       0x1fff0000 - 0x1fffffff |  64 KiB | Dev RW PXN | Remapped Device MMIO
[    0.083974]       0xfe000000 - 0xff84ffff |  24 MiB | Dev RW PXN | Device MMIO
[    0.084862] Current privilege level: EL1
[    0.085339] Exception handling state:
[    0.085783]       Debug:  Masked
[    0.086173]       SError: Masked
[    0.086563]       IRQ:    Masked
[    0.086953]       FIQ:    Masked
[    0.087343] Architectural timer resolution: 18 ns
[    0.087917] Drivers loaded:
[    0.088253]       1. BCM GPIO
[    0.088610]       2. BCM PL011 UART
[    0.089033] Timer test, spinning for 1 second
[     !!!    ] Writing through the remapped UART at 0x1FFF_1000
[    1.089900] Echoing input nowHowever, when I copy and paste the (same) mmu-code from  Note: 
 I plan on getting a hardware debugger. But in the meantime, any thoughts on what I'm doing wrong here? ❯ terminal-s.exe
--- COM3 is connected. Press Ctrl+] to quit ---
......
......
[    2.211136] MAIR_EL1: 0xff04
[    2.485412] translation tables populated
[    2.486370] Special regions:
[    2.489151]       0x00080000 - 0x000a2fff | 140 KiB | C   RO PX  | Kernel code and RO data
[    2.497317]       0x1fff0000 - 0x1fffffff |  64 KiB | Dev RW PXN | Remapped Device MMIO
[    2.505223]       0xfe000000 - 0xff84ffff |  24 MiB | Dev RW PXN | Device MMIO
[    2.512347] BASE ADDR: 0x280000
[    2.515387] TTBR0_EL1: 0x280000
[    2.518427] TCR_EL1: 0x200807520
[    2.521555] first isb passed
[    2.524334] SCTLR_EL1: 0xc50838
[    2.527375] new SCTLR_EL1: 0xc5183d
[ 
---- crashes ----- a red led turns on and stays on | 
Beta Was this translation helpful? Give feedback.
-
| This doc outlines the AArch64 Linux boot protocol and
architectural/micro-architectural expectations:
https://www.kernel.org/doc/Documentation/arm64/booting.txt
The MMU needs to be off and the caching needs to be explicitly disabled
additionally.… On Thu, Apr 21, 2022 at 7:38 PM Andre Richter ***@***.***> wrote:
 Well, the first thing that Linux will do is to set up its own page tables.
 I don’t know by heart what the expectation from a previous boot loader
 stage is with respect to the architectural state of the memory subsystem.
 For starters, I would probably just disable caching again before jumping
 Linux.
 —
 Reply to this email directly, view it on GitHub
 <#155 (reply in thread)>,
 or unsubscribe
 <https://github.com/notifications/unsubscribe-auth/AAFMKYVEXFOPYGMKBGRFJETVGGOBZANCNFSM5TP7SIFQ>
 .
 You are receiving this because you commented.Message ID:
 <rust-embedded/rust-raspberrypi-OS-tutorials/repo-discussions/155/comments/2610867
 @github.com>
 | 
Beta Was this translation helpful? Give feedback.
-
| @nihalpasham can you do me a favor and check what the speedup is with instruction caching alone? Would be a nice datapoint to have. | 
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Disclaimer: - I'm assuming this topic can be discussed here. If not, please let me know and I will remove this topic.
Question: ran into an odd issue. I'm working on a secure bootloader that's written entirely in rust. Most of the boot code for the rpi4 is from this repo. I managed to get all the pieces working. However, I've run into a strange performance issue. The gist of it is
OpenSSL and the sha2 craterunning on a standard linux OS + raspberry pi 4 i.e. the hashing-speed for openssl is 121 MiB/s and sha2 is 82 MiB/s, which roughly translates to less than 3 seconds for a 30MB file.I'm hoping folks here who have more experience with a rpi can offer some insight into what's probably missing/wrong.
A link to the implementation. The boot code is present in
/boards/bootloaders/rpi4/src/boot.rsserial output from an rpi4: as you can see from the logs below, computing a hash kernel and ramdisk takes an additional 80 secs (give or take).
Beta Was this translation helpful? Give feedback.
All reactions