Skip to content

arch: arm: arm-v8 a/r: move bss/noinit sections to the end of linker script #79502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Dat-NguyenDuy
Copy link
Collaborator

@Dat-NguyenDuy Dat-NguyenDuy commented Oct 7, 2024

This PR helps to reduces size of zephyr.bin on non-XIP by moving bss/noinit sections to the end of linker script.
As a result, moving __kernel_ram_start to another place to ensure it still include bss + data as before.

@Dat-NguyenDuy
Copy link
Collaborator Author

Hi @nashif , @dcpleung could you help review this PR. Thanks

@stephanosio stephanosio assigned ithinuel and unassigned nashif Nov 8, 2024
@Dat-NguyenDuy
Copy link
Collaborator Author

Hello, @ithinuel could you help to take a look, thanks.

dcpleung
dcpleung previously approved these changes Nov 18, 2024
@dcpleung dcpleung removed their assignment Nov 18, 2024
@ithinuel
Copy link
Collaborator

This seems related to #74042 and more specifically #60368 which was reverted in #61595.

@gramsay0 & @tejlmand, you have a much better understanding of linker scripts than I have.
Could you have a look at this change please?

@gramsay0
Copy link
Collaborator

gramsay0 commented Nov 26, 2024

@ithinuel #60368 was reverted because of unrelated, undefined behavior in a test, but was never reapplied after the test was fixed (see the last few comments in #61572).

AFAICT this looks good

gramsay0
gramsay0 previously approved these changes Nov 26, 2024
Copy link
Collaborator

@tejlmand tejlmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in themselves looks reasonable, but I would like this work to be aligned with the work done in #74042, so that we can have similar structure / layout on the cortex_m and cortex_a_r, to the extent possible of course;

I made some comments here: #74042 (review)
Please take a look, especially:

Seems like the arm64, cortex_a_r suffers the same issue and for the above reason but also because it's nice to have the linker.ld files somewhat aligned in how they look, then it would be nice to have those aligned as well.

#74042 covers cortex_m, and this PR covers cortex_a_r, however they define the bss section differently in the two files.
For #74042 (cortex_m) the bss section is moved all the way to the end of the linker script, and this PR just moves it a little bit.

Please get the location in the ld templates aligned between the two PRs.
It's hard enough already when looking into the ld template files, and having same sections ordered differently when there is no obvious reason just make things harder to understand and update.

Perhaps going for the at-the-end approach used in #74042, as that also seems to cover the last_section counter when XIP is used, except for the ITCM case, see #74042 (comment).

as for:

and disabling Kconfig LINKER_LAST_SECTION_ID by default if non-XIP (it is seems only needed for XIP)

then I wonder from where this impression stems ?
The _flash_used might be used by images themselves to know their size or by bootloaders / validation code to know the size of image. Thus independent to XIP / non-XIP.
I believe moving the BSS section description to the end will have positive impact on the _flash_used calculation.

@ithinuel ithinuel removed their assignment Jan 17, 2025
@Dat-NguyenDuy
Copy link
Collaborator Author

Hi @tejlmand, could you pls revisit this PR. Thanks a lot.

*/
. = ALIGN(4);
__bss_start = .;
__kernel_ram_start = .;
Copy link
Collaborator

@wearyzen wearyzen Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially reduces __kernel_ram_start from bss+data to just bss and doesn't seem to be inline with its definition here.

This also changes the dynamic region configured by mpu here

And if there is an attempt to change __kernel_ram_start to something other than __bss_start just to comply with this PR then there is also this code in linker which assumes __kernel_ram_start is same as __bss_start.

I am not an expert in linker scripts but wanted to highlight the above issue for other experts reviewing this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wearyzen thanks for those observations.

This essentially reduces __kernel_ram_start from bss+data to just bss and doesn't seem to be inline with its definition here.

Nice catch 👍 .
afaict, and what you also observed, is that the __kernel_start_ram symbol should have been moved so that it still covers the same memory region as before, even though the sections within that region is reorganized.

Seems this should also be addressed as followup from this PR: #74042
FYI @gramsay0

Copy link
Collaborator

@gramsay0 gramsay0 Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wearyzen thanks for finding that!

It looks like there was a similar issue and fix for aarch64 here:
7473aa9#diff-197a7777652a0ad3f659a00cf6e1840c8e1e0c0612857064a4d1bb21f56beea6

Seems this should also be addressed as followup from this PR: #74042
FYI @gramsay0

@tejlmand Yep, will do 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wearyzen, @tejlmand I have addressed the comment. Pls take a look. Thanks

Copy link
Collaborator

@tejlmand tejlmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the observation from @wearyzen to ensure the region defined as the kernel ram region is preserved even if the internal sections are relocated.

*/
. = ALIGN(4);
__bss_start = .;
__kernel_ram_start = .;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wearyzen thanks for those observations.

This essentially reduces __kernel_ram_start from bss+data to just bss and doesn't seem to be inline with its definition here.

Nice catch 👍 .
afaict, and what you also observed, is that the __kernel_start_ram symbol should have been moved so that it still covers the same memory region as before, even though the sections within that region is reorganized.

Seems this should also be addressed as followup from this PR: #74042
FYI @gramsay0

@Dat-NguyenDuy
Copy link
Collaborator Author

Yep, sure. I will address @wearyzen's comment.

@ibirnbaum
Copy link
Member

Hi all,

I'm the author of #60368, as was mentioned before, this PR was first merged but then reverted due to what later turned out to be a bug in a logging-related test. At that time, I didn't get around to getting that PR started again, and things went dormant for quite some time, as my colleagues and I eventually just used the linker command file from the PR out-of-tree, as we had to get rid of the potential massive .bss/.noinit bloat in our flat .bin images one way or another. As far as I can tell, my version of the linker command file can still be merged with relatively little effort, not too much has changed in the cortex_a_r linker command file since then so we just stuck to this solution.

So thanks to @Dat-NguyenDuy for picking up the issue again and making a fix available to everyone!

I had a look at my old PR and two things I can comment on from the sideline are the following (referring to: https://github.com/ibirnbaum/zephyr/blob/d8545f42e2bfceb51dd89b990c632401374be70f/include/zephyr/arch/arm/aarch32/cortex_a_r/scripts/linker.ld):

  • It seemed by the time that I submitted my PR nobody had really tried building for XIP on Cortex-A, and when building for qemu_cortex_a9 or actual Zynq-based hardware, there was a weird syntax error hidden behind the SECTION_DATA_PROLOGUE macro. That's why I defined SECTION_ALIGN in dependency of XIP mode. Does that problem still exist, or did someone change something about SECTION_DATA_PROLOGUE in the meantime?
  • There was an issue revolving around the placement of z_mapped_start, which was documented in aarch32 excn vector not pinned in mmu causing newlib heap overlap #51024. This would have been fixed if my PR had been (re-)merged. If z_mapped_start does not cover the first 4k of the image, which contains the exception vectors to be copied to 0x0 (plus the required padding up to the size of a full MMU page), the first call that maps memory at run-time via the MMU will assign the single page where the vectors used to be in the image followed by subsequent pages which are above the Zephyr image. So any first memory mapping at run-time which is > 4k will work with memory that is not physically contiguous. In the end, during my testing, it eventually turned out that the position of z_mapped_start already used for XIP was also applicable for non-XIP and solved the issue of the memory mapper thinking that this single 4k page below all the Zephyr code/data was suitable for a user mapping at run-time.

If it's helpful, you can ping me for verification testing on the Zynq once the modifications of the linker command file are considered final.

wearyzen
wearyzen previously approved these changes Mar 14, 2025
@mathieuchopstm
Copy link
Collaborator

as for:

and disabling Kconfig LINKER_LAST_SECTION_ID by default if non-XIP (it is seems only needed for XIP)

then I wonder from where this impression stems ? The _flash_used might be used by images themselves to know their size or by bootloaders / validation code to know the size of image. Thus independent to XIP / non-XIP. I believe moving the BSS section description to the end will have positive impact on the _flash_used calculation.

I agree with @Dat-NguyenDuy 's analysis about this point: LINKER_LAST_SECTION_ID should be disabled when !XIP, otherwise the entire noinit is emitted (filled with 0xFF padding) in zephyr.bin, followed by the LINKER_LAST_SECTION_ID_PATTERN word.

Not blocking, but a follow-up PR or issue should be opened to address aforementionned point after this PR is merged. (A solution might just be to have a different formula for calculating _flash_used depending on whether CONFIG_XIP is enabled?)

@wearyzen
Copy link
Collaborator

wearyzen commented Apr 4, 2025

@Dat-NguyenDuy could you please rebase and resolve conflicts in this PR?

@Dat-NguyenDuy
Copy link
Collaborator Author

Dat-NguyenDuy commented Apr 4, 2025

@Dat-NguyenDuy could you please rebase and resolve conflicts in this PR?
I've fixed the conflict. I see you have changed this recently:

#if ROM_ADDR != RAM_ADDR
    . = RAM_ADDR;
#endif

so my update (used #if CONFIG_XIP) is not needed anymore.

Edit: Hi @wearyzen, looking into your recently merged PR: https://github.com/zephyrproject-rtos/zephyr/pull/87965/files. After the PR merged, the text region is overlapping with RAM in non-XIP again (that was addressed by this PR: #77291).

IMO, for non-XIP, the RAM_ADDR and ROM_ADDR should be the same (SRAM), at least for ARM. CONFIG_FLASH_<> should not be used, because the default linker script does not use Flash at all, no input sections are placed in there: https://github.com/zephyrproject-rtos/zephyr/blob/main/include/zephyr/arch/arm/cortex_a_r/scripts/linker.ld#L22-L26, https://github.com/zephyrproject-rtos/zephyr/blob/main/include/zephyr/arch/arm/cortex_m/scripts/linker.ld#L22-L26.
-> default linker script should be updated instead.

Also by default, if non-XIP, CONFIG_FLASH_SIZE=0, https://github.com/zephyrproject-rtos/zephyr/blob/main/arch/Kconfig#L238. Some SoCs override this value for specific purposes, such as configuring the MPU for Flash, which is not necessary. I think this can be cleaned up in the future.

@Dat-NguyenDuy
Copy link
Collaborator Author

as for:

and disabling Kconfig LINKER_LAST_SECTION_ID by default if non-XIP (it is seems only needed for XIP)

then I wonder from where this impression stems ? The _flash_used might be used by images themselves to know their size or by bootloaders / validation code to know the size of image. Thus independent to XIP / non-XIP. I believe moving the BSS section description to the end will have positive impact on the _flash_used calculation.

I agree with @Dat-NguyenDuy 's analysis about this point: LINKER_LAST_SECTION_ID should be disabled when !XIP, otherwise the entire noinit is emitted (filled with 0xFF padding) in zephyr.bin, followed by the LINKER_LAST_SECTION_ID_PATTERN word.

Not blocking, but a follow-up PR or issue should be opened to address aforementionned point after this PR is merged. (A solution might just be to have a different formula for calculating _flash_used depending on whether CONFIG_XIP is enabled?)

Yeah, i was wondering the same, but since Tejlmand mentioned that the scenario is difficult to reproduce, so we decided to leave it as it is.

@wearyzen
Copy link
Collaborator

wearyzen commented Apr 7, 2025

Edit: Hi @wearyzen, looking into your recently merged PR: https://github.com/zephyrproject-rtos/zephyr/pull/87965/files. After the PR merged, the text region is overlapping with RAM in non-XIP again (that was addressed by this PR: #77291).

Hi @Dat-NguyenDuy, so in 87965, I reverted to what was initially proposed in 77291 (ref: #77291 (comment)) so now I am bit confused and wondering if 77291 fixed any problem at all in the initial version😕 .
1 thing is for sure that, neither of the 2 solutions in the above PRs is completely right. With #if CONFIG_XIP the ci fails as reported here and with 87965 you still have the ram overlapping with text.
Update: we can use CONFIG_XIP to avoid the overlap issue however, we will also have to add entries in MMU config to avoid ci failures like #77291 (comment). Since this looks like a topic of its own, we can do this as part of another PR and get this one merged first. Actually, I can push a draft PR once this gets merged to fix the overlap issue and we can discuss it over there if you would like.

IMO, for non-XIP, the RAM_ADDR and ROM_ADDR should be the same (SRAM), at least for ARM. CONFIG_FLASH_<> should not be used, because the default linker script does not use Flash at all, no input sections are placed in there: https://github.com/zephyrproject-rtos/zephyr/blob/main/include/zephyr/arch/arm/cortex_a_r/scripts/linker.ld#L22-L26, https://github.com/zephyrproject-rtos/zephyr/blob/main/include/zephyr/arch/arm/cortex_m/scripts/linker.ld#L22-L26. -> default linker script should be updated instead.

Also by default, if non-XIP, CONFIG_FLASH_SIZE=0, https://github.com/zephyrproject-rtos/zephyr/blob/main/arch/Kconfig#L238. Some SoCs override this value for specific purposes, such as configuring the MPU for Flash, which is not necessary. I think this can be cleaned up in the future.

I don't have much experience on linker scripts as you do but I agree that it needs a cleanup since there is a lot of confusion.
I would appreciate if you or some linker expert would help on this.

By moving data section between rodata and bss, and disabling
Kconfig LINKER_LAST_SECTION_ID by default if non-XIP (it is
seems only needed for XIP), the size of zephyr.bin can be
reduced significantly on non-XIP system.

As a result, moving __kernel_ram_start to another place
to ensure it still include bss + data as before.

Signed-off-by: Dat Nguyen Duy <dat.nguyenduy@nxp.com>
@Dat-NguyenDuy Dat-NguyenDuy force-pushed the armv8-a/r-linker-improvement branch from 9233ba7 to a189bf3 Compare April 7, 2025 16:43
@wearyzen
Copy link
Collaborator

wearyzen commented Apr 7, 2025

@tejlmand could you please have a look at this PR as well? there are few new follow ups raised but it would be good to get this one merged first.

@Dat-NguyenDuy
Copy link
Collaborator Author

Hi @tejlmand, could you take another look, you left one blocker for this PR. Thanks

@kartben kartben merged commit eb7ac1c into zephyrproject-rtos:main May 1, 2025
24 checks passed
@manuargue manuargue deleted the armv8-a/r-linker-improvement branch May 4, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.