Skip to content

[sysvabi64] Add chapter on Thread Local Storage #311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

smithp35
Copy link
Contributor

The thread local storage chapter contains:

  • A description of Thread Local Storage based on addenda32
  • The key design decisions of AArch64 TLS such as tls variant, tls dialect, TCB size.
  • The ABI required code sequence for TLSDESC that must be emitted exactly, as GNU ld requires it to be.
  • Sequences for the different code-models.
  • Relaxations for GD->IE, GD->LE and IE->LE.
  • Synchronization requirements for Lazy TLSDESC. With advice not to support it due to overhead of synchronization.

and ``PT_TLS`` as the program header with type PT_TLS. ``PAD`` must be
the smallest positive integer that satisfies the following congruence:

``TP + TCB + PAD ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TP+TCB+PAD on the left could be confusing, as TCB is placed before TP. Perhaps mention the requirement of TP first (= 0 (modulo p_align)), then describe PAD and this formula.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can word it better. I've found it difficult to try and explain the formula intuitively.

Given that ``TP ≡ 0 (modulo PT_TLS.p_align)``. An expression
for `PAD` is ``PAD = (PT_TLS.p_vaddr - TCB) mod PT_TLS.p_align``.

A significant number of dynamic linkers use a different calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that glibc Variant I handles p_vaddr!=0 (mod p_align) correctly. The bug (https://sourceware.org/bugzilla/show_bug.cgi?id=24606
) is for Variant II (x86 etc).

I have fixed FreeBSD rtld's Variant II in https://reviews.freebsd.org/D31538 . Its Variant I may or may not have the bug.

musl has been good since 1.1.23

Therefore, it's probably not "a significant number" but yeah p_vaddr=0 (mod p_align) is good for maximum compatibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it difficult to be confident on the status of the various dynamic linkers. I can remove the significant part.

The glibc bug looks good for static TLS, it does mention in https://sourceware.org/bugzilla/show_bug.cgi?id=24606#c7 that dynamic TLS still needs p_vaddr to be 0 (modulo p_align).

add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var
ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var

Static link time TLS Relaxations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps call this Optimization to be consistent with x86/ppc and "Relocation optimization" (ADRP) and leave the term "relocation relaxation" for RISC-V style section shrinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TLS specifically I'd prefer to keep relaxation as that's what its been referred to in all the previous literature such as Drepper's ELF Handling for Thread Local Storage and the TLSDESC paper too. It should help people searching in the references.

I take the point that it ought to have been called optimization. I'll add a sentence to say that we're using relaxation as a term from the existing literature.


Undefined Weak Symbols

An undefined weak symbol has the value 0. As the resolver function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be the glibc behavior, but musl doesn't have the special __dl_tlsdesc_undefweak. I think it's better to allow flexibility and require a particular behavior on undefined weak TLS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is just an example of what can be done. I've written at the top of the section

The TLS resolver functions are not standardized by this ABI as they are internal to the dynamic linker

and

These examples are for illustrative purposes only

I'll see if there's anything I can do to state that there is no requirement to implement a specific resolver function.

Copy link
Contributor Author

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the review.

I've updated based on this and some comments I received internally.

Given that ``TP ≡ 0 (modulo PT_TLS.p_align)``. An expression
for `PAD` is ``PAD = (PT_TLS.p_vaddr - TCB) mod PT_TLS.p_align``.

A significant number of dynamic linkers use a different calculation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it difficult to be confident on the status of the various dynamic linkers. I can remove the significant part.

The glibc bug looks good for static TLS, it does mention in https://sourceware.org/bugzilla/show_bug.cgi?id=24606#c7 that dynamic TLS still needs p_vaddr to be 0 (modulo p_align).

add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var
ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var

Static link time TLS Relaxations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TLS specifically I'd prefer to keep relaxation as that's what its been referred to in all the previous literature such as Drepper's ELF Handling for Thread Local Storage and the TLSDESC paper too. It should help people searching in the references.

I take the point that it ought to have been called optimization. I'll add a sentence to say that we're using relaxation as a term from the existing literature.


Undefined Weak Symbols

An undefined weak symbol has the value 0. As the resolver function
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is just an example of what can be done. I've written at the top of the section

The TLS resolver functions are not standardized by this ABI as they are internal to the dynamic linker

and

These examples are for illustrative purposes only

I'll see if there's anything I can do to state that there is no requirement to implement a specific resolver function.

and ``PT_TLS`` as the program header with type PT_TLS. ``PAD`` must be
the smallest positive integer that satisfies the following congruence:

``TP + TCB + PAD ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)``
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can word it better. I've found it difficult to try and explain the formula intuitively.

Rules governing thread local storage on AArch64
-----------------------------------------------

* How to denote TLS in source programs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these How.. bullet points, I'd replace the period at the end with a :, to prime readers for the format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.


This document and AAELF64_ are concerned with:

* How to relocate, statically and dynamically, with respect to symbols
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In How to relocate.. I'm expecting the subject at some point. Especially with the subclause and the parentheses it becomes a bit of a confusing sentence that kinda peters out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the format from the 32-bit ABI https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#41introduction-to-thread-local-storage which has a similar list of bullet points followed by It is the last two bullet points that are the subject of this ABI.

What I'm trying to say is that the ABI only covers a small part of what is needed to support TLS. I may have lost that part. I'll try and reword.

constructed when the program is first loaded.

For the purpose of addressing TLS, components, referred to as modules,
of an application are identified using indexes. The module index for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

components, referred to as modules, of an application -> components of an application, referred to as modules,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

TLS variable is accessed the thread's generation count is compared
with the global generation count which can be used to trigger updates
of the DTV. The details are platform specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point it might be useful to give a small high-level example along the lines of for example when dynamic library x gets loaded, we add TLS block, add DTV entry.., just to drive off confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see what I can do. This part of TLS is, strictly speaking, outside the scope of the ABI, it is implementation details of a dynamic loader.

I think there's three parts:

  • dlopen, dlclose increment/decrement the global generation number independently of whether the module uses TLS.
  • tls_get_addr() compares the thread's generation counter against the global generation number and will resize/reallocate the dynamic thread vector if it is different.
  • Initially the dynamic thread vector entry is unallocated. On first use tls_get_addr will allocate and initialize the TLS.


AArch64 TLS SystemV design choices

* AArch64 uses variant 1 TLS as described in ELFTLS_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention ELFTLS when doing the TLS introduction as a for more in-depth info resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. I'll mention that the introduction in the ABI is only sufficient to describe the terms used like Thread Control Block. A general introduction can be found in ELFTLS_

additional code to add ``x0`` to be added to ``tp``, this is not part
of the ABI required TLSDESC code sequence.

Small Code Model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section and the sections below I'd put semicolons after all the model headers: Small Code Model:, or use some kind of header format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use a semicolon. I think I'm already at the lowest level of header.

TLS variables from the thread pointer are static link time
constants. The code sequences are the same for all code models.

The instruction sequences below are not ABI. Using the instructions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are not ABI doesn't sound wholly English?

Perhaps: are not ABI specifications, but using..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used are "not required by the ABI, but using".

Static link time TLS Relaxations
--------------------------------

Relaxation is a term used by the TLS literature such as ELFTLS_ to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this general info up a bit. By now we have used the term twice in this patch.

Say the first 2 paragraphs until using the constrained model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

implemented. Due to the restrictions on calling convention, the
resolver routines must be written in assembly language.

Static TLS Specialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these as well I would add a colon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

movz xn, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var
movk xn, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var

TLS Descriptors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move these two more general sections above Code sequences for accessing TLS variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at this tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

smithp35 added 6 commits June 23, 2025 16:17
The thread local storage chapter contains:
* A description of Thread Local Storage based on addenda32
* The key design decisions of AArch64 TLS such as tls variant,
  tls dialect, TCB size.
* The ABI required code sequence for TLSDESC that must be emitted
  exactly, as GNU ld requires it to be.
* Sequences for the different code-models.
* Relaxations for GD->IE, GD->LE and IE->LE.
* Synchronization requirements for Lazy TLSDESC. With advice not
  to support it due to overhead of synchronization.
* Edits to split up the bullet points in How to denote TLS
  in source.
* Changed program-own state to process-state as the thread-id
  may not be stored separately from the programs data.
* Removed typically from some of the descriptions as the typically
  will almost always be the case for a sysvabi platform.
* Linked alignment padding to the definition.
* Provided a bit more information about generation counters.
* Rearranged formulas and used TCBsize to make it clearer.
* Taken out "significant" from a significant number of dynamic
  linkers.
* Give reason for using relaxation rather than optimization.
* Clarify that there is no requirement to implement any TLSDESC
  resolver given in the sysvabi.
Change the input register in add xn, xn, :tprel_hi12:var, lsl ARM-software#12
to the thread pointer tp. We want to calculate the offset from the
thread pointer so it needs to be an input of the add.
Document the decision in the GCC mailing list thread
TLSDESC clobber ABI stability/futureproofness?
https://gcc.gnu.org/legacy-ml/gcc/2018-10/msg00112.html

TLSDESC resolver functions assume that any registers added
by an extension are caller saved for a TLSDESC call.

A brief summary:

Dynamic TLS may be lazy allocated upon the first use of a TLSDESC
resolver. This may involve calls to heap allocation functions
provided by the user, which may use registers from extensions
like SVE and SME. As the resolver function can't know what is
saved it would have to save all SVE and SME state. This would
be way more expensive than a caller save, and an older libc
written prior to the introduction of the extension would be
unaware of them so the caller has to do the save.

* The SVE and SME state is already
Copy link
Contributor Author

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the comments. I'll hopefully have a new patch tomorrow.

Rules governing thread local storage on AArch64
-----------------------------------------------

* How to denote TLS in source programs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.


This document and AAELF64_ are concerned with:

* How to relocate, statically and dynamically, with respect to symbols
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the format from the 32-bit ABI https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#41introduction-to-thread-local-storage which has a similar list of bullet points followed by It is the last two bullet points that are the subject of this ABI.

What I'm trying to say is that the ABI only covers a small part of what is needed to support TLS. I may have lost that part. I'll try and reword.

constructed when the program is first loaded.

For the purpose of addressing TLS, components, referred to as modules,
of an application are identified using indexes. The module index for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

TLS variable is accessed the thread's generation count is compared
with the global generation count which can be used to trigger updates
of the DTV. The details are platform specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see what I can do. This part of TLS is, strictly speaking, outside the scope of the ABI, it is implementation details of a dynamic loader.

I think there's three parts:

  • dlopen, dlclose increment/decrement the global generation number independently of whether the module uses TLS.
  • tls_get_addr() compares the thread's generation counter against the global generation number and will resize/reallocate the dynamic thread vector if it is different.
  • Initially the dynamic thread vector entry is unallocated. On first use tls_get_addr will allocate and initialize the TLS.


AArch64 TLS SystemV design choices

* AArch64 uses variant 1 TLS as described in ELFTLS_.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. I'll mention that the introduction in the ABI is only sufficient to describe the terms used like Thread Control Block. A general introduction can be found in ELFTLS_

additional code to add ``x0`` to be added to ``tp``, this is not part
of the ABI required TLSDESC code sequence.

Small Code Model
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use a semicolon. I think I'm already at the lowest level of header.

knows that the TLS variable is defined in the same module as the code
that is accessing the variable. In this case the offset of the TLS
variable from the start of the module's TLS block is a static link
time constant. Instead of dynamically calculating the offset of the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

TLS variables from the thread pointer are static link time
constants. The code sequences are the same for all code models.

The instruction sequences below are not ABI. Using the instructions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used are "not required by the ABI, but using".

movz xn, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var
movk xn, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var

TLS Descriptors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at this tomorrow.

implemented. Due to the restrictions on calling convention, the
resolver routines must be written in assembly language.

Static TLS Specialization
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

smithp35 added 4 commits June 27, 2025 09:09
Include a pseudo code description of __tls_get_addr with deferred
TLS for dynamic modules.
Use integers modulo m to avoid excess use of (modulo m).
Explain the congruence symbol.
Put expression first so derivation is optional.
Copy link
Contributor Author

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review comments. I've made the following updates:

  • Simple NFC text changes.
  • Better description of deferred TLS and generation count.
  • Reworded the padding size derivation.
  • Moved paragraphs around to make the flow a bit easier.

Should be visible as 4 separate commits

Static link time TLS Relaxations
--------------------------------

Relaxation is a term used by the TLS literature such as ELFTLS_ to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

movz xn, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var
movk xn, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var

TLS Descriptors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants