-
Notifications
You must be signed in to change notification settings - Fork 208
[sysvabi64] Add chapter on Thread Local Storage #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sysvabi64/sysvabi64.rst
Outdated
and ``PT_TLS`` as the program header with type PT_TLS. ``PAD`` must be | ||
the smallest positive integer that satisfies the following congruence: | ||
|
||
``TP + TCB + PAD ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TP+TCB+PAD
on the left could be confusing, as TCB is placed before TP. Perhaps mention the requirement of TP first (= 0 (modulo p_align)), then describe PAD
and this formula.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can word it better. I've found it difficult to try and explain the formula intuitively.
sysvabi64/sysvabi64.rst
Outdated
Given that ``TP ≡ 0 (modulo PT_TLS.p_align)``. An expression | ||
for `PAD` is ``PAD = (PT_TLS.p_vaddr - TCB) mod PT_TLS.p_align``. | ||
|
||
A significant number of dynamic linkers use a different calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that glibc Variant I handles p_vaddr!=0 (mod p_align)
correctly. The bug (https://sourceware.org/bugzilla/show_bug.cgi?id=24606
) is for Variant II (x86 etc).
I have fixed FreeBSD rtld's Variant II in https://reviews.freebsd.org/D31538 . Its Variant I may or may not have the bug.
musl has been good since 1.1.23
Therefore, it's probably not "a significant number" but yeah p_vaddr=0 (mod p_align)
is good for maximum compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it difficult to be confident on the status of the various dynamic linkers. I can remove the significant part.
The glibc bug looks good for static TLS, it does mention in https://sourceware.org/bugzilla/show_bug.cgi?id=24606#c7 that dynamic TLS still needs p_vaddr to be 0 (modulo p_align).
add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var | ||
ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var | ||
|
||
Static link time TLS Relaxations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps call this Optimization to be consistent with x86/ppc and "Relocation optimization" (ADRP) and leave the term "relocation relaxation" for RISC-V style section shrinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TLS specifically I'd prefer to keep relaxation as that's what its been referred to in all the previous literature such as Drepper's ELF Handling for Thread Local Storage and the TLSDESC paper too. It should help people searching in the references.
I take the point that it ought to have been called optimization. I'll add a sentence to say that we're using relaxation as a term from the existing literature.
|
||
Undefined Weak Symbols | ||
|
||
An undefined weak symbol has the value 0. As the resolver function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be the glibc behavior, but musl doesn't have the special __dl_tlsdesc_undefweak
. I think it's better to allow flexibility and require a particular behavior on undefined weak TLS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is just an example of what can be done. I've written at the top of the section
The TLS resolver functions are not standardized by this ABI as they are internal to the dynamic linker
and
These examples are for illustrative purposes only
I'll see if there's anything I can do to state that there is no requirement to implement a specific resolver function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the review.
I've updated based on this and some comments I received internally.
sysvabi64/sysvabi64.rst
Outdated
Given that ``TP ≡ 0 (modulo PT_TLS.p_align)``. An expression | ||
for `PAD` is ``PAD = (PT_TLS.p_vaddr - TCB) mod PT_TLS.p_align``. | ||
|
||
A significant number of dynamic linkers use a different calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it difficult to be confident on the status of the various dynamic linkers. I can remove the significant part.
The glibc bug looks good for static TLS, it does mention in https://sourceware.org/bugzilla/show_bug.cgi?id=24606#c7 that dynamic TLS still needs p_vaddr to be 0 (modulo p_align).
add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var | ||
ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var | ||
|
||
Static link time TLS Relaxations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TLS specifically I'd prefer to keep relaxation as that's what its been referred to in all the previous literature such as Drepper's ELF Handling for Thread Local Storage and the TLSDESC paper too. It should help people searching in the references.
I take the point that it ought to have been called optimization. I'll add a sentence to say that we're using relaxation as a term from the existing literature.
|
||
Undefined Weak Symbols | ||
|
||
An undefined weak symbol has the value 0. As the resolver function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is just an example of what can be done. I've written at the top of the section
The TLS resolver functions are not standardized by this ABI as they are internal to the dynamic linker
and
These examples are for illustrative purposes only
I'll see if there's anything I can do to state that there is no requirement to implement a specific resolver function.
sysvabi64/sysvabi64.rst
Outdated
and ``PT_TLS`` as the program header with type PT_TLS. ``PAD`` must be | ||
the smallest positive integer that satisfies the following congruence: | ||
|
||
``TP + TCB + PAD ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can word it better. I've found it difficult to try and explain the formula intuitively.
sysvabi64/sysvabi64.rst
Outdated
Rules governing thread local storage on AArch64 | ||
----------------------------------------------- | ||
|
||
* How to denote TLS in source programs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these How..
bullet points, I'd replace the period at the end with a :
, to prime readers for the format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK.
|
||
This document and AAELF64_ are concerned with: | ||
|
||
* How to relocate, statically and dynamically, with respect to symbols |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In How to relocate..
I'm expecting the subject at some point. Especially with the subclause and the parentheses it becomes a bit of a confusing sentence that kinda peters out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the format from the 32-bit ABI https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#41introduction-to-thread-local-storage which has a similar list of bullet points followed by It is the last two bullet points that are the subject of this ABI.
What I'm trying to say is that the ABI only covers a small part of what is needed to support TLS. I may have lost that part. I'll try and reword.
sysvabi64/sysvabi64.rst
Outdated
constructed when the program is first loaded. | ||
|
||
For the purpose of addressing TLS, components, referred to as modules, | ||
of an application are identified using indexes. The module index for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components, referred to as modules, of an application
-> components of an application, referred to as modules,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
sysvabi64/sysvabi64.rst
Outdated
TLS variable is accessed the thread's generation count is compared | ||
with the global generation count which can be used to trigger updates | ||
of the DTV. The details are platform specific. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point it might be useful to give a small high-level example along the lines of for example when dynamic library x gets loaded, we add TLS block, add DTV entry..
, just to drive off confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see what I can do. This part of TLS is, strictly speaking, outside the scope of the ABI, it is implementation details of a dynamic loader.
I think there's three parts:
- dlopen, dlclose increment/decrement the global generation number independently of whether the module uses TLS.
- tls_get_addr() compares the thread's generation counter against the global generation number and will resize/reallocate the dynamic thread vector if it is different.
- Initially the dynamic thread vector entry is unallocated. On first use tls_get_addr will allocate and initialize the TLS.
|
||
AArch64 TLS SystemV design choices | ||
|
||
* AArch64 uses variant 1 TLS as described in ELFTLS_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps mention ELFTLS when doing the TLS introduction as a for more in-depth info
resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. I'll mention that the introduction in the ABI is only sufficient to describe the terms used like Thread Control Block. A general introduction can be found in ELFTLS_
sysvabi64/sysvabi64.rst
Outdated
additional code to add ``x0`` to be added to ``tp``, this is not part | ||
of the ABI required TLSDESC code sequence. | ||
|
||
Small Code Model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this section and the sections below I'd put semicolons after all the model headers: Small Code Model:
, or use some kind of header format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use a semicolon. I think I'm already at the lowest level of header.
sysvabi64/sysvabi64.rst
Outdated
TLS variables from the thread pointer are static link time | ||
constants. The code sequences are the same for all code models. | ||
|
||
The instruction sequences below are not ABI. Using the instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are not ABI
doesn't sound wholly English?
Perhaps: are not ABI specifications, but using..
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used are "not required by the ABI, but using".
sysvabi64/sysvabi64.rst
Outdated
Static link time TLS Relaxations | ||
-------------------------------- | ||
|
||
Relaxation is a term used by the TLS literature such as ELFTLS_ to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this general info up a bit. By now we have used the term twice in this patch.
Say the first 2 paragraphs until using the constrained model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK.
sysvabi64/sysvabi64.rst
Outdated
implemented. Due to the restrictions on calling convention, the | ||
resolver routines must be written in assembly language. | ||
|
||
Static TLS Specialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these as well I would add a colon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
sysvabi64/sysvabi64.rst
Outdated
movz xn, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var | ||
movk xn, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var | ||
|
||
TLS Descriptors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move these two more general sections above Code sequences for accessing TLS variables
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
The thread local storage chapter contains: * A description of Thread Local Storage based on addenda32 * The key design decisions of AArch64 TLS such as tls variant, tls dialect, TCB size. * The ABI required code sequence for TLSDESC that must be emitted exactly, as GNU ld requires it to be. * Sequences for the different code-models. * Relaxations for GD->IE, GD->LE and IE->LE. * Synchronization requirements for Lazy TLSDESC. With advice not to support it due to overhead of synchronization.
* Edits to split up the bullet points in How to denote TLS in source. * Changed program-own state to process-state as the thread-id may not be stored separately from the programs data. * Removed typically from some of the descriptions as the typically will almost always be the case for a sysvabi platform. * Linked alignment padding to the definition. * Provided a bit more information about generation counters.
* Rearranged formulas and used TCBsize to make it clearer. * Taken out "significant" from a significant number of dynamic linkers. * Give reason for using relaxation rather than optimization. * Clarify that there is no requirement to implement any TLSDESC resolver given in the sysvabi.
Change the input register in add xn, xn, :tprel_hi12:var, lsl ARM-software#12 to the thread pointer tp. We want to calculate the offset from the thread pointer so it needs to be an input of the add.
Document the decision in the GCC mailing list thread TLSDESC clobber ABI stability/futureproofness? https://gcc.gnu.org/legacy-ml/gcc/2018-10/msg00112.html TLSDESC resolver functions assume that any registers added by an extension are caller saved for a TLSDESC call. A brief summary: Dynamic TLS may be lazy allocated upon the first use of a TLSDESC resolver. This may involve calls to heap allocation functions provided by the user, which may use registers from extensions like SVE and SME. As the resolver function can't know what is saved it would have to save all SVE and SME state. This would be way more expensive than a caller save, and an older libc written prior to the introduction of the extension would be unaware of them so the caller has to do the save. * The SVE and SME state is already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the comments. I'll hopefully have a new patch tomorrow.
sysvabi64/sysvabi64.rst
Outdated
Rules governing thread local storage on AArch64 | ||
----------------------------------------------- | ||
|
||
* How to denote TLS in source programs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK.
|
||
This document and AAELF64_ are concerned with: | ||
|
||
* How to relocate, statically and dynamically, with respect to symbols |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the format from the 32-bit ABI https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#41introduction-to-thread-local-storage which has a similar list of bullet points followed by It is the last two bullet points that are the subject of this ABI.
What I'm trying to say is that the ABI only covers a small part of what is needed to support TLS. I may have lost that part. I'll try and reword.
sysvabi64/sysvabi64.rst
Outdated
constructed when the program is first loaded. | ||
|
||
For the purpose of addressing TLS, components, referred to as modules, | ||
of an application are identified using indexes. The module index for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
sysvabi64/sysvabi64.rst
Outdated
TLS variable is accessed the thread's generation count is compared | ||
with the global generation count which can be used to trigger updates | ||
of the DTV. The details are platform specific. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see what I can do. This part of TLS is, strictly speaking, outside the scope of the ABI, it is implementation details of a dynamic loader.
I think there's three parts:
- dlopen, dlclose increment/decrement the global generation number independently of whether the module uses TLS.
- tls_get_addr() compares the thread's generation counter against the global generation number and will resize/reallocate the dynamic thread vector if it is different.
- Initially the dynamic thread vector entry is unallocated. On first use tls_get_addr will allocate and initialize the TLS.
|
||
AArch64 TLS SystemV design choices | ||
|
||
* AArch64 uses variant 1 TLS as described in ELFTLS_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. I'll mention that the introduction in the ABI is only sufficient to describe the terms used like Thread Control Block. A general introduction can be found in ELFTLS_
sysvabi64/sysvabi64.rst
Outdated
additional code to add ``x0`` to be added to ``tp``, this is not part | ||
of the ABI required TLSDESC code sequence. | ||
|
||
Small Code Model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use a semicolon. I think I'm already at the lowest level of header.
sysvabi64/sysvabi64.rst
Outdated
knows that the TLS variable is defined in the same module as the code | ||
that is accessing the variable. In this case the offset of the TLS | ||
variable from the start of the module's TLS block is a static link | ||
time constant. Instead of dynamically calculating the offset of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
sysvabi64/sysvabi64.rst
Outdated
TLS variables from the thread pointer are static link time | ||
constants. The code sequences are the same for all code models. | ||
|
||
The instruction sequences below are not ABI. Using the instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used are "not required by the ABI, but using".
sysvabi64/sysvabi64.rst
Outdated
movz xn, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var | ||
movk xn, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var | ||
|
||
TLS Descriptors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at this tomorrow.
sysvabi64/sysvabi64.rst
Outdated
implemented. Due to the restrictions on calling convention, the | ||
resolver routines must be written in assembly language. | ||
|
||
Static TLS Specialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
Include a pseudo code description of __tls_get_addr with deferred TLS for dynamic modules.
Use integers modulo m to avoid excess use of (modulo m). Explain the congruence symbol. Put expression first so derivation is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review comments. I've made the following updates:
- Simple NFC text changes.
- Better description of deferred TLS and generation count.
- Reworded the padding size derivation.
- Moved paragraphs around to make the flow a bit easier.
Should be visible as 4 separate commits
sysvabi64/sysvabi64.rst
Outdated
Static link time TLS Relaxations | ||
-------------------------------- | ||
|
||
Relaxation is a term used by the TLS literature such as ELFTLS_ to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK.
sysvabi64/sysvabi64.rst
Outdated
movz xn, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var | ||
movk xn, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var | ||
|
||
TLS Descriptors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
The thread local storage chapter contains: