Skip to content

[AAELF64] AArch64 Veneer Types Recognized by Binary Analysis Tools. #333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ilinpv
Copy link

@ilinpv ilinpv commented Jun 11, 2025

The patch introduces definitions for standard veneers on AArch64 to improve recognition by binary analysis tools.

Comment on lines +2028 to +2029
__AArch64Abs[XO]LongThunk_<target>:
B <target>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is LLD the only linker that generates this type of veneer? I don't know if the inclusion of AbsLong in the name was intentional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a side-effect of LLDs implementation. There is a single veneer that chooses whether it is short or long depending on the distance.

I think there could be a patch to LLD that changes the name when the choice is made, although to date it's not been important enough to implement.

At the moment ld.bfd doesn't implement this shorter form of veneer. I don't know about mold.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is observed in LLD. At this point, in addition to LLD, I’ve gathered veneer names produced by ld.bfd, mold ( I guess @rui314 know more about mold thunks ) and go linker

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just implemented a long range extension thunk to mold. I chose this code sequence: https://github.com/rui314/mold/blob/47f6c2839c15cd0e982956c760428617ea35a0e9/src/arch-arm64.cc#L604-L614


__AArch64AbsLongThunk_<target>:
LDR X16, =<target>
BR X16
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an expectation that the literal pool follows the instruction sequence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In theory a linker could put it in a separate section within 1 MiB away, however that would require some special case code to ensure it stayed in range for not much gain. So in practice every veneer implementation that I've seen has the literal immediately afterwards.

Although not an issue with the above implementation, a veneer with an odd number of instructions may need 4-bytes of padding to ensure 8-byte alignment of the literal pool (for targets that have strict alignment enabled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be feasible to first emit a number of veneers and only then the data.

However it's not obvious why we need both this and the execute only variant. There are an infinite number of possible veneer sequences, but for each range we need only one. We can always emit only execute veneers.

Note given none of the code models currently support a .text size larger than 2GB, so you'd need assembler trickery with absolute symbols to force these >4GB veneers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've occasionally seen people with embedded systems (not set up MMU yet) have a boot loader in low memory that jumps to a high address.

I suspect that if we default to using the ADRP thunk (4 GiB) universally, either using it within range and falling back to the longer sequence if out of range, or for lld at least defaulting --pic-veneer to True which will force use of the ADRP thunk.

@MaskRay
Copy link
Contributor

MaskRay commented Jun 12, 2025

Thanks! The description looks good to me.


__AArch64AbsLongThunk_<target>:
LDR X16, =<target>
BR X16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In theory a linker could put it in a separate section within 1 MiB away, however that would require some special case code to ensure it stayed in range for not much gain. So in practice every veneer implementation that I've seen has the literal immediately afterwards.

Although not an issue with the above implementation, a veneer with an odd number of instructions may need 4-bytes of padding to ensure 8-byte alignment of the literal pool (for targets that have strict alignment enabled).

Comment on lines +2028 to +2029
__AArch64Abs[XO]LongThunk_<target>:
B <target>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a side-effect of LLDs implementation. There is a single veneer that chooses whether it is short or long depending on the distance.

I think there could be a patch to LLD that changes the name when the choice is made, although to date it's not been important enough to implement.

At the moment ld.bfd doesn't implement this shorter form of veneer. I don't know about mold.

MOVK X16, #:abs_g2_nc:<target>, LSL #32
MOVK X16, #:abs_g3:<target>, LSL #48
BR X16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the ld.bfd veneer which is similar to the __AArch64AbsLongThunk_ except it is also position independent.

Instead of loading the address directly, it loads the offset from the PC

LDR X16, =.Loffset_to_target
ADR X17, #0 // X17 = current PC
ADD X16, X16, X17
BR  X16
.Loffset_to_target:
.xword target - (. - 12) // R_AARCH64_PREL64 target + 12 

I don't know what the naming convention is for GNU ld. I think they may have a stubs section without individual names for each veneer.

Copy link
Contributor

@Wilco1 Wilco1 Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the GOT is within 4GB, doing ADRP/LDR/BR from the GOT would be a better option since it avoids placing literals in .text. And if not, it makes sense to make this veneer execute-only too and use MOVZ/MOVK for the offset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible limitation for some linkers is that veneers are often added quite late and the GOT may be fixed at this point. I think it should be possible to add to the GOT for ld.lld.

I would expect that for most programs if the GOT were within 4 GiB so would the destination function so the ADRP, ADD could be used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we have such GOT veneer generated in go linker. However, I'm not sure what relocation is generated for that kind of veneer and can we rely on it in BOLT. Looking into the code it seems they have jump relocation referred to that veneer.

ADD X16, X16, :lo12:<target>
BR X16

Note that ``<target>`` may be an entry in the PLT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the case for all of the veneers described here. If there is a B or BL to the PLT entry for target and the PLT is > 128 MiB away from the B/BL then there will be a veneer generated.

Although not part of the veneer code. When there is an indirect branch to the PLT entry care is needed to add a BTI to the PLT entry if we're generating a program that is setting AARCH64_FEATURE_1_BTI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - and if lazy binding is disabled linkers can shortcut the PLT by directly loading from the GOT.

This is also better than using a literal load in the .text section and needing 2 variants for each veneer...

It appears we need a bit of ABI design on linker veneers...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at the stage that veneers are generated we'll know if there's a .got.plt entry for the target so we should be able to load from .plt.got when -znow is used. Off the top of my head:

ADRP x16, :got: target@got.plt
LDR x16:got_lo12: target@got.plt
BR x16


__AArch64BTIThunk_ BTI Landing Pad Veneers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOVK X16, #:abs_g3:<target>, LSL #48
BR X16

Note that some of the MOVK instructions may be omitted if their corresponding 16-bit segments of the address are zero and do not need to be explicitly set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit MOVK if the value is 0x0000 if MOVZ is used or 0xffff if MOVN is used. It's unlikely linkers optimize anything but the top 16 bits given that this veneer will only needed if the distance is over 4GB.

ADD X16, X16, :lo12:<target>
BR X16

Note that ``<target>`` may be an entry in the PLT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - and if lazy binding is disabled linkers can shortcut the PLT by directly loading from the GOT.

This is also better than using a literal load in the .text section and needing 2 variants for each veneer...

It appears we need a bit of ABI design on linker veneers...


__AArch64AbsLongThunk_<target>:
LDR X16, =<target>
BR X16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be feasible to first emit a number of veneers and only then the data.

However it's not obvious why we need both this and the execute only variant. There are an infinite number of possible veneer sequences, but for each range we need only one. We can always emit only execute veneers.

Note given none of the code models currently support a .text size larger than 2GB, so you'd need assembler trickery with absolute symbols to force these >4GB veneers.

MOVK X16, #:abs_g2_nc:<target>, LSL #32
MOVK X16, #:abs_g3:<target>, LSL #48
BR X16

Copy link
Contributor

@Wilco1 Wilco1 Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the GOT is within 4GB, doing ADRP/LDR/BR from the GOT would be a better option since it avoids placing literals in .text. And if not, it makes sense to make this veneer execute-only too and use MOVZ/MOVK for the offset.

Copy link
Contributor

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely some room for improvement in thunks. Would need a small amount of implementation work in linkers.

ADD X16, X16, :lo12:<target>
BR X16

Note that ``<target>`` may be an entry in the PLT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at the stage that veneers are generated we'll know if there's a .got.plt entry for the target so we should be able to load from .plt.got when -znow is used. Off the top of my head:

ADRP x16, :got: target@got.plt
LDR x16:got_lo12: target@got.plt
BR x16

MOVK X16, #:abs_g2_nc:<target>, LSL #32
MOVK X16, #:abs_g3:<target>, LSL #48
BR X16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible limitation for some linkers is that veneers are often added quite late and the GOT may be fixed at this point. I think it should be possible to add to the GOT for ld.lld.

I would expect that for most programs if the GOT were within 4 GiB so would the destination function so the ADRP, ADD could be used.


__AArch64AbsLongThunk_<target>:
LDR X16, =<target>
BR X16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've occasionally seen people with embedded systems (not set up MMU yet) have a boot loader in low memory that jumps to a high address.

I suspect that if we default to using the ADRP thunk (4 GiB) universally, either using it within range and falling back to the longer sequence if out of range, or for lld at least defaulting --pic-veneer to True which will force use of the ADRP thunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants